API Reference

Core Classes

class bolingual.CandidateIndex(vocabulary: 'list[str]', orth_keys: 'dict[str, str]', pron_tokens: 'dict[str, tuple[str, ...]]')[source]
Parameters:
vocabulary: list[str]
orth_keys: dict[str, str]
pron_tokens: dict[str, tuple[str, ...]]
classmethod from_words(words)[source]
Parameters:

words (Iterable[str])

Return type:

CandidateIndex

orthographic_ranking(hindi_text)[source]
Parameters:

hindi_text (str)

Return type:

list[tuple[float, str]]

phonetic_ranking(hindi_text, candidates, schwa_cost=0.3)[source]
Parameters:
Return type:

list[tuple[float, str]]

hybrid_ranking(hindi_text, top_k=50, alpha=0.5, schwa_cost=0.3)[source]
Parameters:
Return type:

dict[str, list[tuple[float, str]]]

Benchmarking

bolingual.build_benchmark_dataframe(raw_path, config=BenchmarkConfig(test_fraction=0.25, random_state=42))[source]
Parameters:
  • raw_path (str | Path)

  • config (BenchmarkConfig)

Return type:

DataFrame

Experiment Evaluation

bolingual.evaluate_benchmark(benchmark, split='test', config=ExperimentConfig(top_k=200, alpha=0.5, schwa_cost=0.3))[source]
Parameters:
  • benchmark (DataFrame)

  • split (str)

  • config (ExperimentConfig)

Return type:

DataFrame

bolingual.summarize_results(results)[source]
Parameters:

results (DataFrame)

Return type:

dict[str, Any]

Phonetics Module

bolingual.phonetics.cmudict_entries()[source]
Return type:

dict[str, list[list[str]]]

bolingual.phonetics.clean_english(text)[source]
Parameters:

text (str)

Return type:

str

bolingual.phonetics.romanize_hindi(text)[source]
Parameters:

text (str)

Return type:

str

bolingual.phonetics.coarse_latin(text)[source]
Parameters:

text (str)

Return type:

str

bolingual.phonetics.tokenize_latin(text)[source]
Parameters:

text (str)

Return type:

tuple[str, …]

bolingual.phonetics.hindi_variants(text)[source]
Parameters:

text (str)

Return type:

tuple[tuple[str, …], …]

bolingual.phonetics.arpabet_to_tokens(pronunciation)[source]
Parameters:

pronunciation (tuple[str, ...])

Return type:

tuple[str, …]

bolingual.phonetics.spelling_to_tokens(word)[source]
Parameters:

word (str)

Return type:

tuple[str, …]

bolingual.phonetics.english_pron_tokens(word)[source]
Parameters:

word (str)

Return type:

tuple[str, …]

bolingual.phonetics.substitution_cost(left, right)[source]
Parameters:
Return type:

float

bolingual.phonetics.weighted_similarity(left, right, schwa_cost=0.3)[source]
Parameters:
Return type:

float

bolingual.phonetics.max_phonetic_similarity(hindi_text, english_word, schwa_cost=0.3)[source]
Parameters:
  • hindi_text (str)

  • english_word (str)

  • schwa_cost (float)

Return type:

float

bolingual.phonetics.coarse_match_key_for_hindi(text)[source]
Parameters:

text (str)

Return type:

str

bolingual.phonetics.coarse_match_key_for_english(word)[source]
Parameters:

word (str)

Return type:

str