API Reference¶ Core Classes¶ class bolingual.CandidateIndex(vocabulary: 'list[str]', orth_keys: 'dict[str, str]', pron_tokens: 'dict[str, tuple[str, ...]]')[source]¶ Parameters: vocabulary (list[str]) orth_keys (dict[str, str]) pron_tokens (dict[str, tuple[str, ...]]) vocabulary: list[str]¶ orth_keys: dict[str, str]¶ pron_tokens: dict[str, tuple[str, ...]]¶ classmethod from_words(words)[source]¶ Parameters: words (Iterable[str]) Return type: CandidateIndex orthographic_ranking(hindi_text)[source]¶ Parameters: hindi_text (str) Return type: list[tuple[float, str]] phonetic_ranking(hindi_text, candidates, schwa_cost=0.3)[source]¶ Parameters: hindi_text (str) candidates (Iterable[str]) schwa_cost (float) Return type: list[tuple[float, str]] hybrid_ranking(hindi_text, top_k=50, alpha=0.5, schwa_cost=0.3)[source]¶ Parameters: hindi_text (str) top_k (int) alpha (float) schwa_cost (float) Return type: dict[str, list[tuple[float, str]]] Benchmarking¶ bolingual.build_benchmark_dataframe(raw_path, config=BenchmarkConfig(test_fraction=0.25, random_state=42))[source]¶ Parameters: raw_path (str | Path) config (BenchmarkConfig) Return type: DataFrame Experiment Evaluation¶ bolingual.evaluate_benchmark(benchmark, split='test', config=ExperimentConfig(top_k=200, alpha=0.5, schwa_cost=0.3))[source]¶ Parameters: benchmark (DataFrame) split (str) config (ExperimentConfig) Return type: DataFrame bolingual.summarize_results(results)[source]¶ Parameters: results (DataFrame) Return type: dict[str, Any] Phonetics Module¶ bolingual.phonetics.cmudict_entries()[source]¶ Return type: dict[str, list[list[str]]] bolingual.phonetics.clean_english(text)[source]¶ Parameters: text (str) Return type: str bolingual.phonetics.romanize_hindi(text)[source]¶ Parameters: text (str) Return type: str bolingual.phonetics.coarse_latin(text)[source]¶ Parameters: text (str) Return type: str bolingual.phonetics.tokenize_latin(text)[source]¶ Parameters: text (str) Return type: tuple[str, …] bolingual.phonetics.hindi_variants(text)[source]¶ Parameters: text (str) Return type: tuple[tuple[str, …], …] bolingual.phonetics.arpabet_to_tokens(pronunciation)[source]¶ Parameters: pronunciation (tuple[str, ...]) Return type: tuple[str, …] bolingual.phonetics.spelling_to_tokens(word)[source]¶ Parameters: word (str) Return type: tuple[str, …] bolingual.phonetics.english_pron_tokens(word)[source]¶ Parameters: word (str) Return type: tuple[str, …] bolingual.phonetics.substitution_cost(left, right)[source]¶ Parameters: left (str) right (str) Return type: float bolingual.phonetics.weighted_similarity(left, right, schwa_cost=0.3)[source]¶ Parameters: left (tuple[str, ...]) right (tuple[str, ...]) schwa_cost (float) Return type: float bolingual.phonetics.max_phonetic_similarity(hindi_text, english_word, schwa_cost=0.3)[source]¶ Parameters: hindi_text (str) english_word (str) schwa_cost (float) Return type: float bolingual.phonetics.coarse_match_key_for_hindi(text)[source]¶ Parameters: text (str) Return type: str bolingual.phonetics.coarse_match_key_for_english(word)[source]¶ Parameters: word (str) Return type: str