API Reference¶

Core Classes¶

class bolingual.CandidateIndex(vocabulary: 'list[str]', orth_keys: 'dict[str, str]', pron_tokens: 'dict[str, tuple[str, ...]]')[source]¶

Parameters:

vocabulary (list[str])
orth_keys (dict[str, str])
pron_tokens (dict[str, tuple[str, ...]])

vocabulary: list[str]¶

orth_keys: dict[str, str]¶

pron_tokens: dict[str, tuple[str, ...]]¶

classmethod from_words(words)[source]¶

Parameters:: words (Iterable[str])
Return type:: CandidateIndex

orthographic_ranking(hindi_text)[source]¶

Parameters:: hindi_text (str)
Return type:: list[tuple[float, str]]

phonetic_ranking(hindi_text, candidates, schwa_cost=0.3)[source]¶

Parameters:

hindi_text (str)
candidates (Iterable[str])
schwa_cost (float)

Return type:

list[tuple[float, str]]

hybrid_ranking(hindi_text, top_k=50, alpha=0.5, schwa_cost=0.3)[source]¶

Parameters:

hindi_text (str)
top_k (int)
alpha (float)
schwa_cost (float)

Return type:

dict[str, list[tuple[float, str]]]

Benchmarking¶

bolingual.build_benchmark_dataframe(raw_path, config=BenchmarkConfig(test_fraction=0.25, random_state=42))[source]¶

Parameters:

raw_path (str | Path)
config (BenchmarkConfig)

Return type:

DataFrame

Experiment Evaluation¶

bolingual.evaluate_benchmark(benchmark, split='test', config=ExperimentConfig(top_k=200, alpha=0.5, schwa_cost=0.3))[source]¶

Parameters:

benchmark (DataFrame)
split (str)
config (ExperimentConfig)

Return type:

DataFrame

bolingual.summarize_results(results)[source]¶

Parameters:: results (DataFrame)
Return type:: dict[str, Any]

Phonetics Module¶

bolingual.phonetics.cmudict_entries()[source]¶

Return type:: dict[str, list[list[str]]]

bolingual.phonetics.clean_english(text)[source]¶

Parameters:: text (str)
Return type:: str

bolingual.phonetics.romanize_hindi(text)[source]¶

Parameters:: text (str)
Return type:: str

bolingual.phonetics.coarse_latin(text)[source]¶

Parameters:: text (str)
Return type:: str

bolingual.phonetics.tokenize_latin(text)[source]¶

Parameters:: text (str)
Return type:: tuple[str, …]

bolingual.phonetics.hindi_variants(text)[source]¶

Parameters:: text (str)
Return type:: tuple[tuple[str, …], …]

bolingual.phonetics.arpabet_to_tokens(pronunciation)[source]¶

Parameters:: pronunciation (tuple[str, ...])
Return type:: tuple[str, …]

bolingual.phonetics.spelling_to_tokens(word)[source]¶

Parameters:: word (str)
Return type:: tuple[str, …]

bolingual.phonetics.english_pron_tokens(word)[source]¶

Parameters:: word (str)
Return type:: tuple[str, …]

bolingual.phonetics.substitution_cost(left, right)[source]¶

Parameters:

left (str)
right (str)

Return type:

float

bolingual.phonetics.weighted_similarity(left, right, schwa_cost=0.3)[source]¶

Parameters:

left (tuple[str, ...])
right (tuple[str, ...])
schwa_cost (float)

Return type:

float

bolingual.phonetics.max_phonetic_similarity(hindi_text, english_word, schwa_cost=0.3)[source]¶

Parameters:

hindi_text (str)
english_word (str)
schwa_cost (float)

Return type:

float

bolingual.phonetics.coarse_match_key_for_hindi(text)[source]¶

Parameters:: text (str)
Return type:: str

bolingual.phonetics.coarse_match_key_for_english(word)[source]¶

Parameters:: word (str)
Return type:: str