Collection of OCR-related python tools and wrappers from the OCR-D team
To bootstrap the tool, you'll need installed (Ubuntu packages):
- Python (
python) - pip (
python-pip) - exiftool (
libimage-exiftool-perl) - libxml2-utils for xmllint (
libxml2-utils)
To install system-wide:
make deps-ubuntu deps-pip install
To develop, install to a virtualenv
pip install virtualenv virtualenv --no-site-packages venv source venv/bin/activate make deps-pip install
pyocrd installs a binary ocrd that can be used to invoke the processors
directly (ocrd process) or start (development) webservices (ocrd server)
TODO: Update docs here.
Examples:
# List available processors ocrd process # Region-segment with tesserocr all files in METS INPUT fileGrp ocrd process -m /path/to/mets.xml segment-region/tesserocr # Chain multiple processors ocrd process -m /path/to/mets.xml characterize/exif segment-line/tesserocr recognize/tesserocr # Start a processor web service at port 6543 ocrd server process -p 6543 http PUT localhost:6543/characterize url==http://server/path/to/mets.xml
Download ocrd-assets (make assets)
Test with local files: make test
- Test with local asset server:
- Start asset-server:
make asset-server make test OCRD_BASEURL='http://localhost:5001/'
- Start asset-server:
- Test with remote assets:
make test OCRD_BASEURL='https://github.com/OCR-D/ocrd-assets/raw/master/data/'