search-and-replace

High-performance text correction for OCR output using Hyperscan and SymSpell.

Installation

Requires Hyperscan system library:

# macOS
brew install vectorscan  # ARM
brew install hyperscan   # Intel

# Ubuntu/Debian
apt-get install libhyperscan-dev

Then install via pip:

pip install search-and-replace

Quick Start

from search_and_replace import SpellCorrector, OCRCorrector, PatternCorrector

# Fix common OCR confusions (0→O, 1→l, rn→m)
ocr = OCRCorrector()
ocr.correct("He11o W0rld")  # "Hello WOrld"

# Spell correction with bundled dictionary
spell = SpellCorrector()
spell.correct("helo")  # "hello"

# Pattern matching with Hyperscan
patterns = PatternCorrector([("Network", 1), ("Available", 1)])
patterns.correct("The Netwxrk is Avxilable")  # "The Network is Available"

CLI

search-and-replace ./input -o ./output --patterns wordlist.csv

Indices and tables