Codestin Search App

MthwRobinson · 2022-09-27T14:20:49Z

Summary

Adds an explicit link to the security policy in theREADME for the repo.

qued

LGTM, link works!

* added makefile and poetry dependency file * added initial pdf_to_text script * added initial readme * smaller heading

Disable strategy #2 (guessed MIME-type) and strategy #3 (filename extension) and verify that a caller that asserts the correct MIME-type as the `content_type` argument gets the right `FileType` member.

Add a parameterized test for strategy #2 (guessed MIME-type) that disables strategies #1 and #3 so we know what's actually being tested. Remove individual tests made redundant by the single parameterized test.

Disable strategy #2 (guessed MIME-type) and strategy #3 (filename extension) and verify that a caller that asserts the correct MIME-type as the `content_type` argument gets the right `FileType` member.

Add a parameterized test for strategy #2 (guessed MIME-type) that disables strategies #1 and #3 so we know what's actually being tested. Remove individual tests made redundant by the single parameterized test.

Disable strategy #2 (guessed MIME-type) and strategy #3 (filename extension) and verify that a caller that asserts the correct MIME-type as the `content_type` argument gets the right `FileType` member.

Add a parameterized test for strategy #2 (guessed MIME-type) that disables strategies #1 and #3 so we know what's actually being tested. Remove individual tests made redundant by the single parameterized test.

Disable strategy #2 (guessed MIME-type) and strategy #3 (filename extension) and verify that a caller that asserts the correct MIME-type as the `content_type` argument gets the right `FileType` member.

Add a parameterized test for strategy #2 (guessed MIME-type) that disables strategies #1 and #3 so we know what's actually being tested. Remove individual tests made redundant by the single parameterized test.

…#4266) ## Problem `_patch_current_chars_with_render_mode` is called on every `do_TJ`/`do_Tj` text operator during PDF parsing. The original implementation re-scans the entire `cur_item._objs` list each time, checking `hasattr(item, "rendermode")` to skip already-patched items. For a page with N characters across M text operations, this is O(N*M) — effectively quadratic. Memray profiling showed this function as the #1 allocator: 17.57 GB total across 549M allocations in a session processing just 4 files. ## Fix Track the last-patched index so each call only processes newly-added `LTChar` objects. Reset automatically when `cur_item` changes (new page or figure). **Before:** O(N²) per page — re-scans all accumulated objects on every text operator **After:** O(N) per page — each object visited exactly once --------- Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com> Co-authored-by: Alan Bertl <[email protected]>

added security policy note to readme

a348381

MthwRobinson requested a review from qued September 27, 2022 14:20

qued approved these changes Sep 27, 2022

View reviewed changes

MthwRobinson merged commit e290f08 into main Sep 27, 2022

MthwRobinson deleted the core-149/security-policy branch September 27, 2022 14:32

yuming-long pushed a commit that referenced this pull request Sep 29, 2022

chore CORE-4: Initial repo setup (#1)

4be27cf

* added makefile and poetry dependency file * added initial pdf_to_text script * added initial readme * smaller heading

ztratar mentioned this pull request Aug 3, 2023

List items are detected as titles in OCR only mode #1010

Closed

potter-potter mentioned this pull request Jan 23, 2024

fix: remove none value keys from flattened dictionary #2442

Merged

shriharshan mentioned this pull request Sep 23, 2024

bug/<502 bad gatway Error> #3654

Closed

KRRT7 mentioned this pull request Feb 26, 2026

fix: avoid O(N²) re-scanning in _patch_current_chars_with_render_mode #4266

Merged

qued mentioned this pull request Feb 26, 2026

fix: avoid O(N²) re-scanning in _patch_current_chars_with_render_mode #4269

Closed

PastelStorm mentioned this pull request Feb 27, 2026

feat: Infer hierarchical heading levels (H1-H4) for PDFs #4222

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: Link to security policy in the README#1

docs: Link to security policy in the README#1
MthwRobinson merged 1 commit intomainfrom
core-149/security-policy

MthwRobinson commented Sep 27, 2022

Uh oh!

qued left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MthwRobinson commented Sep 27, 2022

Summary

Uh oh!

qued left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants