⚡️ Speed up function merge_out_layout_with_ocr_layout by 30%#4212
Open
aseembits93 wants to merge 3 commits intoUnstructured-IO:mainfrom
Open
Conversation
The optimized code achieves a **30% speedup** through two key algorithmic improvements in `aggregate_embedded_text_by_block` and `supplement_layout_with_ocr_elements`: ## Key Optimizations ### 1. **Replaced `.sum(axis=1).astype(bool)` with `.any(axis=1)`** This change appears in both functions when computing boolean masks from the result of `bboxes1_is_almost_subregion_of_bboxes2()`: **Why it's faster:** - `.sum(axis=1)` creates an intermediate integer array by counting True values across columns, then converts to boolean - `.any(axis=1)` short-circuits on the first True value per row, avoiding the full summation - Eliminates the explicit `.astype(bool)` conversion overhead **Performance impact:** Based on line profiler, the mask computation in `aggregate_embedded_text_by_block` dropped from ~234ms to ~222ms (5% faster), and the overall function improved from 551ms to 443ms (19.6% faster). ### 2. **Avoided redundant slicing operations** In `aggregate_embedded_text_by_block`, the optimized code stores `sliced = source_regions.slice(mask)` once and reuses it, instead of calling `source_regions.slice(mask)` three separate times: **Why it's faster:** - Each `slice()` operation creates a new object with coordinate and text array copies - Line profiler shows the original made 3 separate slice calls (48ms + 25ms + 34ms = 107ms total) - The optimized version makes 1 slice call (~28ms), saving ~79ms per invocation ### 3. **Early exit with `mask.any()`** The optimized code checks `if mask.any():` before processing, avoiding unnecessary work when no regions match: **Why it's faster:** - Skips text joining, bbox extraction, and IOU calculations when mask is empty - Particularly beneficial for the 368 cases (31% of calls) where no matching regions exist ## Impact Based on Test Results The optimization is particularly effective for workloads with: 1. **Many elements requiring text aggregation** (10-41% speedup on tests with 100-500 elements) - `test_large_scale_many_elements_aggregated`: 77ms → 67.2ms (14.6% faster) - `test_merge_large_number_of_elements`: 43.8ms → 31.0ms (41.3% faster) - `test_merge_boundary_coordinates_large_scale`: 87.3ms → 61.5ms (41.8% faster) 2. **Documents with invalid text patterns** (10-20% speedup) - `test_invalid_texts_are_replaced`: 250μs → 222μs (12.4% faster) - `test_merge_with_all_invalid_text`: 654μs → 554μs (18.1% faster) 3. **Complex spatial matching scenarios** (33-36% speedup) - `test_merge_with_overlapping_elements`: 25.2ms → 18.9ms (33.3% faster) - `test_merge_with_varied_subregion_thresholds`: 78.6ms → 57.6ms (36.4% faster) ## Context Impact The function `merge_out_layout_with_ocr_layout` is called from `supplement_page_layout_with_ocr` in OCR processing hot paths, specifically when `ocr_mode == OCRMode.FULL_PAGE`. Each page processed invokes this function once, making the 30% speedup directly translate to faster document processing throughput for PDF/image partitioning workflows.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
📄 30% (0.30x) speedup for
merge_out_layout_with_ocr_layoutinunstructured/partition/pdf_image/ocr.py⏱️ Runtime :
329 milliseconds→252 milliseconds(best of5runs)📝 Explanation and details
The optimized code achieves a 30% speedup through two key algorithmic improvements in
aggregate_embedded_text_by_blockandsupplement_layout_with_ocr_elements:Key Optimizations
1. Replaced
.sum(axis=1).astype(bool)with.any(axis=1)This change appears in both functions when computing boolean masks from the result of
bboxes1_is_almost_subregion_of_bboxes2():Why it's faster:
.sum(axis=1)creates an intermediate integer array by counting True values across columns, then converts to boolean.any(axis=1)short-circuits on the first True value per row, avoiding the full summation.astype(bool)conversion overheadPerformance impact: Based on line profiler, the mask computation in
aggregate_embedded_text_by_blockdropped from ~234ms to ~222ms (5% faster), and the overall function improved from 551ms to 443ms (19.6% faster).2. Avoided redundant slicing operations
In
aggregate_embedded_text_by_block, the optimized code storessliced = source_regions.slice(mask)once and reuses it, instead of callingsource_regions.slice(mask)three separate times:Why it's faster:
slice()operation creates a new object with coordinate and text array copies3. Early exit with
mask.any()The optimized code checks
if mask.any():before processing, avoiding unnecessary work when no regions match:Why it's faster:
Impact Based on Test Results
The optimization is particularly effective for workloads with:
Many elements requiring text aggregation (10-41% speedup on tests with 100-500 elements)
test_large_scale_many_elements_aggregated: 77ms → 67.2ms (14.6% faster)test_merge_large_number_of_elements: 43.8ms → 31.0ms (41.3% faster)test_merge_boundary_coordinates_large_scale: 87.3ms → 61.5ms (41.8% faster)Documents with invalid text patterns (10-20% speedup)
test_invalid_texts_are_replaced: 250μs → 222μs (12.4% faster)test_merge_with_all_invalid_text: 654μs → 554μs (18.1% faster)Complex spatial matching scenarios (33-36% speedup)
test_merge_with_overlapping_elements: 25.2ms → 18.9ms (33.3% faster)test_merge_with_varied_subregion_thresholds: 78.6ms → 57.6ms (36.4% faster)Context Impact
The function
merge_out_layout_with_ocr_layoutis called fromsupplement_page_layout_with_ocrin OCR processing hot paths, specifically whenocr_mode == OCRMode.FULL_PAGE. Each page processed invokes this function once, making the 30% speedup directly translate to faster document processing throughput for PDF/image partitioning workflows.✅ Correctness verification report:
⚙️ Click to see Existing Unit Tests
partition/pdf_image/test_ocr.py::test_merge_out_layout_with_cid_codepartition/pdf_image/test_ocr.py::test_merge_out_layout_with_ocr_layout🌀 Click to see Generated Regression Tests
To edit these changes
git checkout codeflash/optimize-merge_out_layout_with_ocr_layout-mkrn264uand push.