-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
gh-123378: fix some corner cases of start
and end
values in PyUnicodeErrorObject
#123380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
encukou
merged 34 commits into
python:main
from
picnixz:fix/c-api-unicode-error-get-start
Dec 4, 2024
Merged
Changes from all commits
Commits
Show all changes
34 commits
Select commit
Hold shift + click to select a range
7a1574d
Fix `PyUnicode{Encode,Decode}Error_GetStart`.
picnixz 6ef0c6d
blurb
picnixz 60ab0bb
add tests
picnixz 67b3d8e
fix NEWS
picnixz 78fff57
remove a duplicated normal case
picnixz a6e6f80
handle start < 0
picnixz 20c47ba
add C tests
picnixz 51bc77e
add test coverage
picnixz b290e58
update docs
picnixz 75398a9
fixup
picnixz 546be87
update blurb
picnixz cded571
address Victor's review
picnixz 1900d9a
refactor name
picnixz 8acc563
fix refcounts
picnixz 0538c83
remove debugging code
picnixz d5ea357
address Victor's review (round 2)
picnixz 7c10769
handle negative 'start' and 'end' values
picnixz 7ce2ef0
add C API tests
picnixz b55ca5a
add Python tests
picnixz 4e34e5f
update docs
picnixz 033a1ac
fix typo
picnixz c802e64
convert macros into `static inline` functions
picnixz 66f33f8
Merge branch 'main' into fix/c-api-unicode-error-get-start
skirpichev 6caf5b6
Merge remote-tracking branch 'upstream/main' into fix/c-api-unicode-e…
picnixz fcde448
post-merge cleanup
picnixz 2e66302
Merge branch 'main' into fix/capi/unicode-error-start-end-123378
picnixz dc6b37b
Merge branch 'main' into fix/capi/unicode-error-start-end-123378
picnixz baa5cb2
fix typo
picnixz 4c4808e
update NEWS and docs
picnixz efbdff1
add some assertion checks
picnixz 180f3c2
add some assertion checks
picnixz 4bd6e20
Merge branch 'main' into fix/c-api-unicode-error-get-start
picnixz 5759a70
remove failing assertions for now
picnixz 8c12171
fix docs
picnixz File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
6 changes: 6 additions & 0 deletions
6
Misc/NEWS.d/next/C_API/2024-08-27-09-07-56.gh-issue-123378.JJ6n_u.rst
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
Ensure that the value of :attr:`UnicodeEncodeError.start <UnicodeError.start>` | ||
retrieved by :c:func:`PyUnicodeEncodeError_GetStart` lie in | ||
``[0, max(0, objlen - 1)]`` where *objlen* is the length of | ||
:attr:`UnicodeEncodeError.object <UnicodeError.object>`. Similar | ||
arguments apply to :exc:`UnicodeDecodeError` and :exc:`UnicodeTranslateError` | ||
and their corresponding C interface. Patch by Bénédikt Tran. |
6 changes: 6 additions & 0 deletions
6
Misc/NEWS.d/next/C_API/2024-12-02-16-10-36.gh-issue-123378.Q6YRwe.rst
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
Ensure that the value of :attr:`UnicodeEncodeError.end <UnicodeError.end>` | ||
retrieved by :c:func:`PyUnicodeEncodeError_GetEnd` lies in ``[min(1, objlen), | ||
max(min(1, objlen), objlen)]`` where *objlen* is the length of | ||
:attr:`UnicodeEncodeError.object <UnicodeError.object>`. Similar arguments | ||
apply to :exc:`UnicodeDecodeError` and :exc:`UnicodeTranslateError` and their | ||
corresponding C interface. Patch by Bénédikt Tran. |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a particular reason why 0 isn't allowed? Empty substrings (
start==end
) can be useful for special cases.OTOH, this PR allows “negative” substrings (
start>end
), which does not sound useful -- and it can throw off code likelength = start-end
.IMO,
end
should be clipped to[clipped(start), len(object)]
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original code did:
So it always clipped it to
[1, len(object)]
but put it at 0 for empty sequences. I just documented that behaviour correctly (otherwise it could be confusing).I don't know whether I should enforce it or not (we didn't decide on this matter, I only kept what was already there).
This could be a good alternative. I haven't thought of it, so I'll leave this comment opened to discuss your proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like not much discussion is coming.
Is there anyone whose opinion you'd like to hear?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I'd like to know what Victor and Serhiy think on that. The problem is I'm worried to break things even though we are fixing a bug. However, whatever we choose should not make the following crash:
Maybe we can start by fixing
__str__
? what do you think? (without touching on the getter/setters for now?). I once fixed it by changing the getter/setter implementation but then we didn't really know how to proceed forward.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A straightforward fix for
__str__
sounds like it'd be easy to both review and backport.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll work on that then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #124935