v1.4.0: The Document Scanner Update
·
101 commits
to main
since this release
This release eliminates the manual transcription step for PedalPCB builds. It introduces a PDF Ingestion Engine, allowing you to drag and drop official build documentation directly into the tool alongside your standard CSVs.
It integrates pdfplumber to perform visual table extraction, ensuring that complex, multi-line layouts are parsed as accurately as structured data.
📄 PDF Ingestion Support
- PedalPCB Native: The tool now explicitly supports PedalPCB build documents. It visually scans pages for "Parts List" tables, identifying them via header validation (
LOCATION,VALUE) while ignoring non-BOM data like drill templates. - Visual Layout Engine: Unlike standard text scrapers, the new engine understands grid geometry. It correctly reassembles components that span multiple lines (e.g.,
Resistoron line 1,1/4Won line 2) into a single, clean entry.
🔀 Unified Workflow
- Mixed-Mode Uploads: The "Upload CSV" tab has been upgraded to "Upload Files". You can now batch upload
.csvexports from KiCad and.pdfdocs from PedalPCB in a single action. The app intelligently routes each file to the correct parser based on extension.
🛠️ Technical Improvements
- Heuristic Detection: The parser uses a heuristic scan to identify BOM tables even if they appear deep in the document (e.g., Page 4).
- Resilience: Added robust error handling for "dirty" PDF data, cleaning up extra whitespace and irregular column widths before processing.
- Test Coverage: Added a new suite of mocked unit tests (
unittest.mock) to verify PDF extraction logic without requiring physical files in the CI pipeline.