Quickstart¶
Installation¶
pip install bolingual
Or with uv:
uv add bolingual
Basic Usage¶
Finding Similar-Sounding English Words¶
from bolingual import CandidateIndex
# Build index from English word list
index = CandidateIndex.from_cmudict()
# Query with Hindi word
results = index.hybrid_ranking("वॉटसन", top_k=5)
for word, score in results:
print(f"{word}: {score:.3f}")
Building the Benchmark¶
from pathlib import Path
from bolingual import build_benchmark_dataframe
benchmark = build_benchmark_dataframe(Path("data/raw/crowd_transliterations.hi-en.txt"))
print(f"Total items: {len(benchmark)}")
print(f"Dev/Test split: {benchmark['split'].value_counts().to_dict()}")
Running Experiments¶
from bolingual import CandidateIndex, evaluate_benchmark, summarize_results
index = CandidateIndex.from_cmudict()
results = evaluate_benchmark(benchmark, index, split="test")
metrics = summarize_results(results)
print(metrics)
Command Line Interface¶
Build the benchmark:
bolingual-build-benchmark --raw data/raw/crowd_transliterations.hi-en.txt --output benchmark.csv
Run experiments:
bolingual-run-experiment --benchmark benchmark.csv --output-dir results/
Interactive query:
bolingual-query --hindi "वॉटसन" --show 10