This script parses AIB bank statement PDFs and converts them into CSV files for easier analysis. It supports parsing a single PDF, all PDFs in a directory (combined or separate CSVs), and includes a debug mode for advanced inspection.
- Single-PDF mode: Parse one PDF and produce one CSV.
- Combined-dir mode: Parse all PDFs in a directory and produce one combined CSV.
- Separate-dir mode: Parse all PDFs in a directory and produce one CSV per PDF.
- Debug mode: CSV includes x/y coordinates for each transaction.
- Python 3.7+
- PyMuPDF (
pip install pymupdf)
python golden_parser_aib.py <input_path> [--combined | --separate] [--debug]<input_path>: Path to a PDF file or a directory containing PDF files.--combined: (Directory only) Output a single combined CSV for all PDFs.--separate: (Directory only) Output one CSV per PDF.--debug: Enable debug logging and include x/y coordinates in the CSV.
Parse a single PDF:
python golden_parser_aib.py bank_statements/29th\ August\ 2024.pdfParse all PDFs in a directory into one combined CSV:
python golden_parser_aib.py bank_statements/ --combinedParse all PDFs in a directory into separate CSVs:
python golden_parser_aib.py bank_statements/ --separateEnable debug mode:
python golden_parser_aib.py bank_statements/ --combined --debug- CSV files will be created in the same directory as the input PDFs.
- In combined mode, the CSV will be named
<directory>_combined.csv. - In separate mode, each CSV will have the same name as the PDF but with a
.csvextension.
- PDF Parsing: Uses PyMuPDF to extract tables and words from each PDF page.
- Transaction Extraction: Identifies transaction rows, parses dates, details, debit/credit/balance amounts, and (optionally) their coordinates.
- CSV Export: Writes the parsed transactions to CSV, with optional debug columns.
- The script is tailored for AIB statement layouts and may not work with other banks.
- Sensitive financial data should be handled securely; the
bank_statements/directory is.gitignored by default.