API Reference
High-performance text correction for OCR output.
-
class search_and_replace.OCRCorrector[source]
Bases: object
Corrects common OCR character confusions (0/O, 1/l, rn/m, etc.).
-
correct(text)[source]
Apply OCR confusion corrections.
- Return type:
str
- Parameters:
text (str)
-
class search_and_replace.PatternCorrector(patterns)[source]
Bases: object
Hyperscan-based pattern matching for fuzzy single-character errors.
- Parameters:
patterns (Sequence[tuple[str, int]]) – List of (word, max_errors) tuples.
-
correct(text)[source]
Apply pattern corrections.
- Return type:
str
- Parameters:
text (str)
-
class search_and_replace.Replacer(replacements)[source]
Bases: object
Direct string replacement.
- Parameters:
replacements (Sequence[tuple[str, str]])
-
correct(text)[source]
Apply all replacements.
- Return type:
str
- Parameters:
text (str)
-
class search_and_replace.SpellCorrector(words=None, dictionary=None, max_distance=2)[source]
Bases: object
Levenshtein-based spelling correction using SymSpell.
- Parameters:
words (list[str] | None) – Custom word list. If None and dictionary is None, uses bundled dictionary.
dictionary (Path | str | None) – Path to dictionary file (one word per line).
max_distance (int) – Maximum edit distance (default: 2).
-
correct(word)[source]
Correct a single word.
- Return type:
str
- Parameters:
word (str)
-
correct_text(text)[source]
Correct all words in text.
- Return type:
str
- Parameters:
text (str)
-
search_and_replace.load_patterns(path)[source]
Load pattern list: word,max_errors per line.
- Return type:
list[tuple[str, int]]
- Parameters:
path (Path)
-
search_and_replace.load_replacements(path)[source]
Load replacement list: search,replace per line.
- Return type:
list[tuple[str, str]]
- Parameters:
path (Path)
-
search_and_replace.process_directory(input_dir, output_dir, pattern_data=None, replacement_data=None, *, pattern='*.txt', resume=False, jobs=None)[source]
Process all matching files in parallel.
- Parameters:
input_dir (Path) – Source directory.
output_dir (Path) – Destination directory.
pattern_data (Sequence[tuple[str, int]] | None) – List of (word, max_errors) for PatternCorrector.
replacement_data (Sequence[tuple[str, str]] | None) – List of (search, replace) for Replacer.
pattern (str) – Glob pattern (default: *.txt).
resume (bool) – Skip existing output files.
jobs (int | None) – Worker count (default: CPU count).
- Return type:
tuple[int, int]
- Returns:
(processed_count, skipped_count)
Text correction classes.
-
class search_and_replace.correctors.SpellCorrector(words=None, dictionary=None, max_distance=2)[source]
Bases: object
Levenshtein-based spelling correction using SymSpell.
- Parameters:
words (list[str] | None) – Custom word list. If None and dictionary is None, uses bundled dictionary.
dictionary (Path | str | None) – Path to dictionary file (one word per line).
max_distance (int) – Maximum edit distance (default: 2).
-
correct(word)[source]
Correct a single word.
- Return type:
str
- Parameters:
word (str)
-
correct_text(text)[source]
Correct all words in text.
- Return type:
str
- Parameters:
text (str)
-
class search_and_replace.correctors.OCRCorrector[source]
Bases: object
Corrects common OCR character confusions (0/O, 1/l, rn/m, etc.).
-
correct(text)[source]
Apply OCR confusion corrections.
- Return type:
str
- Parameters:
text (str)
-
class search_and_replace.correctors.PatternCorrector(patterns)[source]
Bases: object
Hyperscan-based pattern matching for fuzzy single-character errors.
- Parameters:
patterns (Sequence[tuple[str, int]]) – List of (word, max_errors) tuples.
-
correct(text)[source]
Apply pattern corrections.
- Return type:
str
- Parameters:
text (str)
-
class search_and_replace.correctors.Replacer(replacements)[source]
Bases: object
Direct string replacement.
- Parameters:
replacements (Sequence[tuple[str, str]])
-
correct(text)[source]
Apply all replacements.
- Return type:
str
- Parameters:
text (str)
Batch file processing and I/O.
-
search_and_replace.batch.load_patterns(path)[source]
Load pattern list: word,max_errors per line.
- Return type:
list[tuple[str, int]]
- Parameters:
path (Path)
-
search_and_replace.batch.load_replacements(path)[source]
Load replacement list: search,replace per line.
- Return type:
list[tuple[str, str]]
- Parameters:
path (Path)
-
search_and_replace.batch.process_directory(input_dir, output_dir, pattern_data=None, replacement_data=None, *, pattern='*.txt', resume=False, jobs=None)[source]
Process all matching files in parallel.
- Parameters:
input_dir (Path) – Source directory.
output_dir (Path) – Destination directory.
pattern_data (Sequence[tuple[str, int]] | None) – List of (word, max_errors) for PatternCorrector.
replacement_data (Sequence[tuple[str, str]] | None) – List of (search, replace) for Replacer.
pattern (str) – Glob pattern (default: *.txt).
resume (bool) – Skip existing output files.
jobs (int | None) – Worker count (default: CPU count).
- Return type:
tuple[int, int]
- Returns:
(processed_count, skipped_count)
Command-line interface.
-
search_and_replace.cli.main(argv=None)[source]
Main entry point.
- Return type:
int
- Parameters:
argv (list[str] | None)