-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
bpo-35066: _dateime.datetime.strftime copies trailing '%' #10692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello, and thanks for your contribution! I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA). Our records indicate we have not received your CLA. For legal reasons we need you to sign this before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue. If you have recently signed the CLA, please wait at least one business day You can check yourself to see if the CLA has been received. Thanks again for your contribution, we look forward to reviewing it! |
Summary (also posted to tracker): Modules/_datetimemodule.c and Lib/datetime.py do not behave identically. Specifically, the strftime functions do not match when passed a format string This situation leads to a scenario in which, for example, "%D %" passed to datetime.strftime (with the C extension included) raises a value error. The same string passed to To summarise, there are two problems: (1) datetime does not comply with PEP-399, and (2) a higher-order module raises an exception on a case that the (exposed) lower-order This PR attempts to fix this problem by removing the case check from the datetime C module. This solves both (1) and (2). There was much talk on the issue thread about there existing a test case for time.strftime that documented a platform-dependent failure on a dangling '%'. I wish to note |
@michaelsaah @tirkarthi Can either of you point to the existing test case on this? I cannot find it. I think that this still needs a new test, because obviously the existing test case does not cover it specifically. Ideally you would want to test that this does not fail for # Whether datetime.strftime can handle a trailing % is platform-dependent,
# so detect whether we're on one of the platforms where this fails ([bpo-35066](https://bugs.python.org/issue35066))
try:
time.strftime('%')
skip_trailing_percent_strftime = False
except ValueError:
skip_trailing_percent_strftime = True This is assuming that we don't already know the platforms where the trailing-% fails. If we know which platforms this fails on, then both the |
@tirkarthi Hm. That is a very unfortunately designed test. Even legitimate failures will be swallowed because it treats all I think for now we should try the "detect if |
@pganssle Thanks Paul. I'm not sure if the platform failure modes are known; I didn't run into documentation of them during my work. I'll work on adding a test case for It looks like there's some error handling being done by Lines 824 to 826 in 163eca3
The macro def'ns are a bit cryptic: Lines 763 to 773 in 542497a
Any help appreciated. |
@michaelsaah I don't think you have to worry about any of that. We know that |
@pganssle Ah ok, I was reading the comment in the test-in-question incorrectly. I see the spec now. Thanks. |
…atetime.strftime doesn't fail, assuming time.strftime doesn't fail
Just added the test. Will probably open another PR to try to cleanup the datetime.strftime tests, they're all over the place. Interestingly, looks like there's been some confusion about this before: cpython/Lib/test/datetimetester.py Lines 1330 to 1342 in 65c216e
|
Modules/_datetimemodule.c
Outdated
|
||
while ((ch = *pin++) != '\0') { | ||
if (ch != '%') { | ||
do { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm... It's a bit hard to keep track of where null checking happens in this loop.
I guess the reason that trailing %
was forbidden in the original version was that they wanted to check for null termination every time the pin
pointer is advanced.
I think there's a change in behavior around null-checking here, because before this change, the new format string did not include the null terminator, after the change, the new format string does include the null terminator. I don't know that this is a problem, but I'm wary of it.
Maybe you can change it like this:
while((ch = *pin++) != '\0') {
if (ch != '%' || (ch = *pin++) == '\0') {
ptoappend = pin - 1;
ntoappend = 1;
}
else if (ch == 'z') {
}
That should keep the logic the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an aside, this terse C-style stuff where you have to keep track of null pointers is for the birds. It took me a long time to convince myself that there was a change in behavior, and even still I am not certain about it. It really feels weird to advance the pointer in the conditionals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just looking at it, I don't think your code does what you think it does. Without a null char check at the end of the loop, you'll read the null char and advance the pointer, and then the check in the while loop condition will actually be reading the char after the null char.
Like you said though, this stuff is tricky, so I'm going to test it and report back.
Also, my first attempt at fixing this, while I think a bit uglier than the do/while solution, comes closer to your desired behavior then the current one. Here it is:
cpython/Modules/_datetimemodule.c
Lines 1525 to 1626 in d178065
while ((ch = *pin++) != '\0') | |
{ | |
if (ch != '%') { | |
ptoappend = pin - 1; | |
ntoappend = 1; | |
} | |
// else if ((ch = *pin++) == '\0') { | |
/* There's a lone trailing %; doesn't make sense. */ | |
// PyErr_SetString(PyExc_ValueError, "strftime format " | |
// "ends with raw %"); | |
// goto Done; | |
// ptoappend = pin - 2; | |
// ntoappend = 2; | |
// } | |
/* A % has been seen and ch is the character after it. */ | |
else if ((ch = *pin++) == 'z') { | |
if (zreplacement == NULL) { | |
/* format utcoffset */ | |
char buf[100]; | |
PyObject *tzinfo = get_tzinfo_member(object); | |
zreplacement = PyBytes_FromStringAndSize("", 0); | |
if (zreplacement == NULL) goto Done; | |
if (tzinfo != Py_None && tzinfo != NULL) { | |
assert(tzinfoarg != NULL); | |
if (format_utcoffset(buf, | |
sizeof(buf), | |
"", | |
tzinfo, | |
tzinfoarg) < 0) | |
goto Done; | |
Py_DECREF(zreplacement); | |
zreplacement = | |
PyBytes_FromStringAndSize(buf, | |
strlen(buf)); | |
if (zreplacement == NULL) | |
goto Done; | |
} | |
} | |
assert(zreplacement != NULL); | |
ptoappend = PyBytes_AS_STRING(zreplacement); | |
ntoappend = PyBytes_GET_SIZE(zreplacement); | |
} | |
else if (ch == 'Z') { | |
/* format tzname */ | |
if (Zreplacement == NULL) { | |
Zreplacement = make_Zreplacement(object, | |
tzinfoarg); | |
if (Zreplacement == NULL) | |
goto Done; | |
} | |
assert(Zreplacement != NULL); | |
assert(PyUnicode_Check(Zreplacement)); | |
ptoappend = PyUnicode_AsUTF8AndSize(Zreplacement, | |
&ntoappend); | |
if (ptoappend == NULL) | |
goto Done; | |
} | |
else if (ch == 'f') { | |
/* format microseconds */ | |
if (freplacement == NULL) { | |
freplacement = make_freplacement(object); | |
if (freplacement == NULL) | |
goto Done; | |
} | |
assert(freplacement != NULL); | |
assert(PyBytes_Check(freplacement)); | |
ptoappend = PyBytes_AS_STRING(freplacement); | |
ntoappend = PyBytes_GET_SIZE(freplacement); | |
} | |
else { | |
/* percent followed by neither z nor Z */ | |
ptoappend = pin - 2; | |
ntoappend = 2; | |
} | |
/* Append the ntoappend chars starting at ptoappend to | |
* the new format. | |
*/ | |
if (ntoappend == 0) | |
continue; | |
assert(ptoappend != NULL); | |
assert(ntoappend > 0); | |
while (usednew + ntoappend > totalnew) { | |
if (totalnew > (PY_SSIZE_T_MAX >> 1)) { /* overflow */ | |
PyErr_NoMemory(); | |
goto Done; | |
} | |
totalnew <<= 1; | |
if (_PyBytes_Resize(&newfmt, totalnew) < 0) | |
goto Done; | |
pnew = PyBytes_AsString(newfmt) + usednew; | |
} | |
memcpy(pnew, ptoappend, ntoappend); | |
pnew += ntoappend; | |
usednew += ntoappend; | |
assert(usednew <= totalnew); | |
printf("%s\n", PyUnicode_AsUTF8(PyObject_Repr(newfmt))); | |
if (ch == '\0') | |
break; | |
} /* end while() */ |
As far as the pointer juggling in the conditionals goes, it's a pattern I've seen before in C parsing code. I think ideally you only mutate the pointer in one place, as it quickly becomes hard to reason about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I suspected, your code doesn't do what we want. Passing it "%D %" produces the format string b'%D \x00\xfb\xfb\xfb\xfb\xfb\xfb\xfb\xfb\xcb\xcb\xcb\xcb\xcb\xcb\xcb\xcb'
.
I obtained this by calling printf("%s\n", PyUnicode_AsUTF8(PyObject_Repr(newfmt)));
after the while loop. If you put the call in the loop itself, you can watch it happen:
b'%D\xcb\xcb\xcb'
b'%D \xcb\xcb'
b'%D \x00\xcb'
b'%D \x00\xfb'
b'%D \x00\xfb\xfb\xcb\xcb\xcb\xcb'
b'%D \x00\xfb\xfb\xfb\xcb\xcb\xcb'
b'%D \x00\xfb\xfb\xfb\xfb\xcb\xcb'
b'%D \x00\xfb\xfb\xfb\xfb\xfb\xcb'
b'%D \x00\xfb\xfb\xfb\xfb\xfb\xfb'
b'%D \x00\xfb\xfb\xfb\xfb\xfb\xfb\xfb\xcb\xcb\xcb\xcb\xcb\xcb\xcb\xcb\xcb'
b'%D \x00\xfb\xfb\xfb\xfb\xfb\xfb\xfb\xfb\xcb\xcb\xcb\xcb\xcb\xcb\xcb\xcb'
I'm not sure what's causing it to eventually break out of the loop and not spin forever; I assume it's one of the length checks that happens in the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@michaelsaah Good call, sorry, I didn't actually try my version, you are right that I advance but then never check if the next character is a NULL
. I think maybe this will do what I was looking for?
while((ch = *pin++) != '\0') {
if (ch != '%' || *pin == '\0') {
ptoappend = pin - 1;
ntoappend = 1;
}
else if ((ch = *pin++) == 'z') {
}
Edit: No, just realized why this won't work. Hm... If I'm right that the behavior is slightly different, we may need to have more than one check in here, unfortunately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you check the earlier commit I linked to? That might be a blueprint for what you want. The key is the null check and break at the end of the loop.
@michaelsaah You might want to add a NEWS entry given the impact and it will also help in keeping the PR green : https://devguide.python.org/committing/#what-s-new-and-news-entries |
…lude null byte in format string.
@pganssle I think you'll like the latest change. Behavior w/ regard to null bytes now matches existing code, as far as I can tell. |
@michaelsaah Sorry for the days of silence. I think the behavior of this is now fixed. I'm not sure how I feel about decrementing the pointer, though. I think the two most obviously viable solutions (barring any that dramatically refactor the code, which I have not considered):
The first one is nice because it doesn't go through unnecessary iterations of the loop, but the second one keeps the null checking logic as close as possible to the point where the pointer is incremented, which keeps the amount of time between reaching the end of the string and knowing we've reached it to a minimum. I think the only other decent option would be something like this: while ((ch = *pin++) != '\0') {
if(ch != '%') {
ptoappend = pin - 1;
ntoappend = 1;
} else if (*(pin + 1) != '\0') {
if ((ch = *pin++) == 'z') {
... I am increasingly convincing myself that number 2 (your current version) is the right way to go, so I say we go ahead and merge this if no one else objects. |
By the way, thank you for your patience with this @michaelsaah, particularly since I know I created some extra work for you with my ill-considered alternative approaches. |
Wow we really commented past each other there. No worries about the extra work, you've been a very clear and thorough reviewer, and I'm doing this for fun while looking for a new job anyway. I originally didn't want to decrement the pointer either, but I'm convinced it's safe, since we only do that if we've already seen a '%', which means we won't be decrementing past the beginning of the buffer. After staring at this for hours over the past couple weeks, I'm not convinced that there's any way to make the control flow more intuitive while keeping the null-checking contained. As far as the difference between (1) and (2), I think (2) fits more naturally with the case-checking semantics that are already there, but it's a toss-up. I don't think either is more confusing (or clearer) than the other. |
As an aside, I don't see how my solution leads to unnecessary traversals of the loop. As soon as it sees a '%' followed by a '\0', it copies the '%' and breaks on the next while condition check. Do you mean that it breaks on the while condition instead of at the bottom of the loop body? |
CC: @abalkin I think this is ready to be merged. |
Thanks @michaelsaah for the PR, and @vstinner for merging it 🌮🎉.. I'm working now to backport this PR to: 3.7. |
GH-11550 is a backport of this pull request to the 3.7 branch. |
…0692) Previously, calling the strftime() method on a datetime object with a trailing '%' in the format string would result in an exception. However, this only occured when the datetime C module was being used; the python implementation did not match this behavior. Datetime is now PEP-399 compliant, and will not throw an exception on a trailing '%'. (cherry picked from commit 454b3d4) Co-authored-by: MichaelSaah <[email protected]>
Previously, calling the strftime() method on a datetime object with a trailing '%' in the format string would result in an exception. However, this only occured when the datetime C module was being used; the python implementation did not match this behavior. Datetime is now PEP-399 compliant, and will not throw an exception on a trailing '%'. (cherry picked from commit 454b3d4) Co-authored-by: MichaelSaah <[email protected]>
https://bugs.python.org/issue35066
https://bugs.python.org/issue35066