⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣀⣀⣀⣤⣤⣤⣤⣀⣀⣀⡀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⣠⣴⣶⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣶⣦⣄⠀████████╗██████╗ █████╗ ██████╗ ███████╗⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠙⠻⢿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⡿⠟⠋⠀╚══██╔══╝██╔══██╗██╔══██╗██╔══██╗██╔════╝⠀⠀⠀⠀
⠀⠀⠀⠀⠀⣿⣶⣤⣄⣉⣉⠙⠛⠛⠛⠛⠛⠛⠋⣉⣉⣠⣤⣶⣿⠀ ██║ ██████╔╝███████║██║ ██║█████╗
⠀⠀⠀⠀⠀⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⠀ ██║ ██╔══██╗██╔══██║██║ ██║██╔══╝
⠀⠀⠀⠀⠀⢿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⡿⠀ ██║ ██║ ██║██║ ██║██████╔╝███████╗
⠀⠀⠀⠀⠀⣄⡉⠛⠻⠿⢿⣿⣿⣿⣿⣿⣿⣿⣿⡿⠿⠟⠛⢉⣠⠀ ╚═╝ ╚═╝ ╚═╝╚═╝ ╚═╝╚═════╝ ╚══════╝
⠀⠀⠀⠀⠀⣿⣿⣿⣶⣶⣤⣤⣤⣤⣤⣤⣤⣤⣤⣤⣶⣶⣿⣿⣿⠀ ███████╗██╗ ██╗███╗ ██╗ ██████╗
⠀⠀⠀⠀⠀⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⠀ ██╔════╝╚██╗ ██╔╝████╗ ██║██╔════╝
⠀⠀⠀⠀⠀⠻⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⠟⠀ ███████╗ ╚████╔╝ ██╔██╗ ██║██║
⠀⠀⠀⠀⠀⣶⣤⣈⡉⠛⠛⠻⠿⠿⠿⠿⠿⠿⠟⠛⠛⢉⣁⣤⣶⠀ ╚════██║ ╚██╔╝ ██║╚██╗██║██║
⠀⠀⠀⠀⠀⣿⣿⣿⣿⣿⣷⣶⣶⣶⣶⣶⣶⣶⣶⣾⣿⣿⣿⣿⣿⠀ ███████║ ██║ ██║ ╚████║╚██████╗
⠀⠀⠀⠀⠀⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⠀ ╚══════╝ ╚═╝ ╚═╝ ╚═══╝ ╚═════╝
⠀⠀⠀⠀⠀⠙⠻⠿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⠿⠟⠋⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠉⠉⠉⠛⠛⠛⠛⠉⠉⠉⠁⠀⠀⠀⠀⠀
make docker
docker compose exec main pip install polars-lts-cpu
docker compose exec main python3.11 \
./src/sync.py \
--transfers ./data/transfers.xml \
--accounts ./data/accounts.json \
--output ./data/output.xml \
--pedantic
# 2025-08-23 18:48:49.537 | INFO | validate:validate_xml_structure:38 - Validated XML structure for: data/transfers.xml
# 2025-08-23 18:48:49.911 | INFO | validate:validate_json_structure:58 - Validated JSON structure for: data/accounts.json
# 2025-08-23 18:48:49.911 | INFO | __main__:run:152 - Files loaded
# 2025-08-23 18:49:00.892 | INFO | __main__:run:155 - Found 1162523 initial bank transfers
# 100%|███████████████████████████████████████████████████████████████████████████████████| 5206/5206 [00:06<00:00, 820.25it/s]
# 2025-08-23 18:49:07.292 | INFO | __main__:run:158 - Optimized to 6621 bank transfers (Removed 99.43%)
# 2025-08-23 18:49:07.352 | INFO | validate:validate_optimization:86 - Net account balances are preserved correctly!
In this transfer optimization problem you've got this set
The solution uses three optimization passes. Flow aggregation merges identical transfers - multiple
Since input files cap out around 16GB, everything fits in memory on consumer hardware. I used Polars for data wrangling. The lazy evaluation and columnar operations make aggregation really fast, compared to Pandas and I won't need to write custom CPython extensions. NetworkX handles graph operations. I bounded cycle search depth at 3 because transfer cycles rarely go beyond 4 hops, keeping me in polynomial territory while catching most optimization opportunities.
Apache Arrow integration through Polars was clutch for handling heterogeneous input formats. Zero-copy conversions between JSON, XML, and native Arrow formats meant I could prototype against different data sources without rewriting parsers.
I put a lot of time into testing because financial data is unforgiving. TDD, unit tests for individual passes, integration tests for the full pipeline, and regression tests for specific bug fixes. Property-based testing with Hypothesis generates random transfer sets and verifies conservation constraints