API Reference

High-performance text correction for OCR output.

class search_and_replace.OCRCorrector[source]

Bases: object

Corrects common OCR character confusions (0/O, 1/l, rn/m, etc.).

correct(text)[source]

Apply OCR confusion corrections.

Return type:

str

Parameters:

text (str)

class search_and_replace.PatternCorrector(patterns)[source]

Bases: object

Hyperscan-based pattern matching for fuzzy single-character errors.

Parameters:

patterns (Sequence[tuple[str, int]]) – List of (word, max_errors) tuples.

correct(text)[source]

Apply pattern corrections.

Return type:

str

Parameters:

text (str)

class search_and_replace.Replacer(replacements)[source]

Bases: object

Direct string replacement.

Parameters:

replacements (Sequence[tuple[str, str]])

correct(text)[source]

Apply all replacements.

Return type:

str

Parameters:

text (str)

class search_and_replace.SpellCorrector(words=None, dictionary=None, max_distance=2)[source]

Bases: object

Levenshtein-based spelling correction using SymSpell.

Parameters:
  • words (list[str] | None) – Custom word list. If None and dictionary is None, uses bundled dictionary.

  • dictionary (Path | str | None) – Path to dictionary file (one word per line).

  • max_distance (int) – Maximum edit distance (default: 2).

correct(word)[source]

Correct a single word.

Return type:

str

Parameters:

word (str)

correct_text(text)[source]

Correct all words in text.

Return type:

str

Parameters:

text (str)

search_and_replace.load_patterns(path)[source]

Load pattern list: word,max_errors per line.

Return type:

list[tuple[str, int]]

Parameters:

path (Path)

search_and_replace.load_replacements(path)[source]

Load replacement list: search,replace per line.

Return type:

list[tuple[str, str]]

Parameters:

path (Path)

search_and_replace.process_directory(input_dir, output_dir, pattern_data=None, replacement_data=None, *, pattern='*.txt', resume=False, jobs=None)[source]

Process all matching files in parallel.

Parameters:
  • input_dir (Path) – Source directory.

  • output_dir (Path) – Destination directory.

  • pattern_data (Sequence[tuple[str, int]] | None) – List of (word, max_errors) for PatternCorrector.

  • replacement_data (Sequence[tuple[str, str]] | None) – List of (search, replace) for Replacer.

  • pattern (str) – Glob pattern (default: *.txt).

  • resume (bool) – Skip existing output files.

  • jobs (int | None) – Worker count (default: CPU count).

Return type:

tuple[int, int]

Returns:

(processed_count, skipped_count)

Text correction classes.

class search_and_replace.correctors.SpellCorrector(words=None, dictionary=None, max_distance=2)[source]

Bases: object

Levenshtein-based spelling correction using SymSpell.

Parameters:
  • words (list[str] | None) – Custom word list. If None and dictionary is None, uses bundled dictionary.

  • dictionary (Path | str | None) – Path to dictionary file (one word per line).

  • max_distance (int) – Maximum edit distance (default: 2).

correct(word)[source]

Correct a single word.

Return type:

str

Parameters:

word (str)

correct_text(text)[source]

Correct all words in text.

Return type:

str

Parameters:

text (str)

class search_and_replace.correctors.OCRCorrector[source]

Bases: object

Corrects common OCR character confusions (0/O, 1/l, rn/m, etc.).

correct(text)[source]

Apply OCR confusion corrections.

Return type:

str

Parameters:

text (str)

class search_and_replace.correctors.PatternCorrector(patterns)[source]

Bases: object

Hyperscan-based pattern matching for fuzzy single-character errors.

Parameters:

patterns (Sequence[tuple[str, int]]) – List of (word, max_errors) tuples.

correct(text)[source]

Apply pattern corrections.

Return type:

str

Parameters:

text (str)

class search_and_replace.correctors.Replacer(replacements)[source]

Bases: object

Direct string replacement.

Parameters:

replacements (Sequence[tuple[str, str]])

correct(text)[source]

Apply all replacements.

Return type:

str

Parameters:

text (str)

Batch file processing and I/O.

search_and_replace.batch.load_patterns(path)[source]

Load pattern list: word,max_errors per line.

Return type:

list[tuple[str, int]]

Parameters:

path (Path)

search_and_replace.batch.load_replacements(path)[source]

Load replacement list: search,replace per line.

Return type:

list[tuple[str, str]]

Parameters:

path (Path)

search_and_replace.batch.process_directory(input_dir, output_dir, pattern_data=None, replacement_data=None, *, pattern='*.txt', resume=False, jobs=None)[source]

Process all matching files in parallel.

Parameters:
  • input_dir (Path) – Source directory.

  • output_dir (Path) – Destination directory.

  • pattern_data (Sequence[tuple[str, int]] | None) – List of (word, max_errors) for PatternCorrector.

  • replacement_data (Sequence[tuple[str, str]] | None) – List of (search, replace) for Replacer.

  • pattern (str) – Glob pattern (default: *.txt).

  • resume (bool) – Skip existing output files.

  • jobs (int | None) – Worker count (default: CPU count).

Return type:

tuple[int, int]

Returns:

(processed_count, skipped_count)

Command-line interface.

search_and_replace.cli.main(argv=None)[source]

Main entry point.

Return type:

int

Parameters:

argv (list[str] | None)