Fix: Add support for Halfwidth and Fullwidth Forms in BasicPDFParser #2170

MORIMOTO520212 · 2025-05-05T03:41:42Z

Halfwidth and Fullwidth Forms frequently appear in Japanese documents. Allowing this Unicode block has improved the accuracy of responses.

Important

Adds support for Halfwidth and Fullwidth Forms in BasicPDFParser.ingest() by including Unicode range \uFF00 to \uFFEF.

Behavior:
- Adds support for Halfwidth and Fullwidth Forms in BasicPDFParser.ingest() by including Unicode range \uFF00 to \uFFEF in character filter.
Files:
- Modified pdf_parser.py to include the new Unicode range in BasicPDFParser.

^{This description was created by}^{for c619757. You can customize this summary. It will automatically update as commits are pushed.}

ellipsis-dev

Important

Looks good to me! 👍

Reviewed everything up to c619757 in 34 seconds. Click for details.

Reviewed 12 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 3 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. py/core/parsers/media/pdf_parser.py:401

Draft comment:
The addition of the Unicode range for Halfwidth and Fullwidth Forms is correct. Ensure this range fits your intended character set filtering.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 0% <= threshold 50% The comment is asking the author to ensure that the Unicode range fits their intended character set filtering. This is a request for confirmation, which violates the rules. It does not provide a specific suggestion or point out a clear issue.

2. py/core/parsers/media/pdf_parser.py:401

Draft comment:
Good addition for Halfwidth and Fullwidth Forms. Consider abstracting the allowed Unicode ranges into a constant or helper function for improved readability and easier maintenance.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

3. py/core/parsers/media/pdf_parser.py:51

Draft comment:
The log message at line 51 says "Starting PDF ingestion using MistralOCRParser", but the class is named OCRPDFParser. Consider updating the log message to match the class name for clarity.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

Workflow ID: wflow_Fenab6p7qFYWM02Y

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

NolanTrem · 2025-05-05T05:18:27Z

Great addition!

Fix: Add support for Halfwidth and Fullwidth Forms in BasicPDFParser

c619757

ellipsis-dev bot reviewed May 5, 2025

View reviewed changes

NolanTrem merged commit 4ce6c74 into SciPhi-AI:main May 5, 2025
2 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix: Add support for Halfwidth and Fullwidth Forms in BasicPDFParser #2170

Fix: Add support for Halfwidth and Fullwidth Forms in BasicPDFParser #2170

Uh oh!

MORIMOTO520212 commented May 5, 2025 •

edited

Loading

Uh oh!

ellipsis-dev bot left a comment

Uh oh!

NolanTrem commented May 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix: Add support for Halfwidth and Fullwidth Forms in BasicPDFParser #2170

Fix: Add support for Halfwidth and Fullwidth Forms in BasicPDFParser #2170

Uh oh!

Conversation

MORIMOTO520212 commented May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Uh oh!

NolanTrem commented May 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MORIMOTO520212 commented May 5, 2025 •

edited

Loading