Thanks to visit codestin.com
Credit goes to github.com

Skip to content

bpo-35066: _dateime.datetime.strftime copies trailing '%' #10692

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Jan 14, 2019

Conversation

michaelsaah
Copy link
Contributor

@michaelsaah michaelsaah commented Nov 23, 2018

@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Our records indicate we have not received your CLA. For legal reasons we need you to sign this before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

If you have recently signed the CLA, please wait at least one business day
before our records are updated.

You can check yourself to see if the CLA has been received.

Thanks again for your contribution, we look forward to reviewing it!

@michaelsaah
Copy link
Contributor Author

Summary (also posted to tracker):

Modules/_datetimemodule.c and Lib/datetime.py do not behave identically. Specifically, the strftime functions do not match when passed a format string
terminated with a '%'. The C function performs an explicit check for this condition, and raises a ValueError on it. The Py version does not perform this check. Both pass the
format string (after doing substitutions for %z, %Z, and %f tags) to the system strftime or wcfstime, depending on platform. These live within the python time module. The
time module wrapper function does not perform this check.

This situation leads to a scenario in which, for example, "%D %" passed to datetime.strftime (with the C extension included) raises a value error. The same string passed to
time.strftime returns "mm/dd/yy %", at least on OSX. Furthermore, if Python is built without the C module, "mm/dd/yy %" is returned when datetime.strftime is called.

To summarise, there are two problems: (1) datetime does not comply with PEP-399, and (2) a higher-order module raises an exception on a case that the (exposed) lower-order
module has no problem with, causing a mismatch in behavior between datetime.strftime and time.strftime.

This PR attempts to fix this problem by removing the case check from the datetime C module. This solves both (1) and (2).

There was much talk on the issue thread about there existing a test case for time.strftime that documented a platform-dependent failure on a dangling '%'. I wish to note
that my patch does not touch the time module at all, it only removes a seemingly unnecessary check in the datetime C module.

@pganssle
Copy link
Member

pganssle commented Nov 26, 2018

@michaelsaah @tirkarthi Can either of you point to the existing test case on this? I cannot find it.

I think that this still needs a new test, because obviously the existing test case does not cover it specifically. Ideally you would want to test that this does not fail for datetime.strftime specifically. If there's a platform-dependent failure in time.strftime, then you can write the test such that the datetime test is skipped if the time test would fail, e.g.:

# Whether datetime.strftime can handle a trailing % is platform-dependent,
# so detect whether we're on one of the platforms where this fails ([bpo-35066](https://bugs.python.org/issue35066))
try:
    time.strftime('%')
    skip_trailing_percent_strftime = False
except ValueError:
    skip_trailing_percent_strftime = True

This is assuming that we don't already know the platforms where the trailing-% fails. If we know which platforms this fails on, then both the time.strftime and datetime.strftime tests should be skipped on the platform with the known failure.

@tirkarthi
Copy link
Member

@pganssle In the tracker I was referring to

def test_strftime_format_check(self):
which was introduced with e5b5895 and has an explicit comment about the test being platform dependent.

@pganssle
Copy link
Member

@tirkarthi Hm. That is a very unfortunately designed test. Even legitimate failures will be swallowed because it treats all ValueErrors as success, even if there is no known platform where this is a failure case.

I think for now we should try the "detect if time.strftime fails, otherwise test datetime.strftime" method, and maybe in the future we can tighten up the failure conditions for time.strftime.

@michaelsaah
Copy link
Contributor Author

@pganssle Thanks Paul. I'm not sure if the platform failure modes are known; I didn't run into documentation of them during my work. I'll work on adding a test case for datetime that is aware of time.strftime's functionality, but it would help me to know what exactly would happen if stdlib call in time.strftime failed, so that I can catch the failure cleanly. Otherwise I guess I'll be spinning up a bunch of VMs and trying to induce failure.

It looks like there's some error handling being done by _Py_BEGIN_SUPPRESS_IPH, but I'm not sure how this propagates (format_time is a macro for strftime/wcfstime depending on platform):

_Py_BEGIN_SUPPRESS_IPH
buflen = format_time(outbuf, i, fmt, &buf);
_Py_END_SUPPRESS_IPH

The macro def'ns are a bit cryptic:

cpython/Include/pyport.h

Lines 763 to 773 in 542497a

#if defined _MSC_VER && _MSC_VER >= 1900
extern _invalid_parameter_handler _Py_silent_invalid_parameter_handler;
#define _Py_BEGIN_SUPPRESS_IPH { _invalid_parameter_handler _Py_old_handler = \
_set_thread_local_invalid_parameter_handler(_Py_silent_invalid_parameter_handler);
#define _Py_END_SUPPRESS_IPH _set_thread_local_invalid_parameter_handler(_Py_old_handler); }
#else
#define _Py_BEGIN_SUPPRESS_IPH
#define _Py_END_SUPPRESS_IPH

Any help appreciated.

@pganssle
Copy link
Member

@michaelsaah I don't think you have to worry about any of that. We know that time.strftime('%') should either raise ValueError or succeed. You can just write a test for datetime.strftime that assumes that datetime_instance.strftime('%') returns '%', and then skip that test conditional on the logic I described in my earlier comment (i.e. skip it on all platforms where time.strftime('%') raises ValueError).

@michaelsaah
Copy link
Contributor Author

@pganssle Ah ok, I was reading the comment in the test-in-question incorrectly. I see the spec now. Thanks.

…atetime.strftime doesn't fail, assuming time.strftime doesn't fail
@michaelsaah
Copy link
Contributor Author

Just added the test. Will probably open another PR to try to cleanup the datetime.strftime tests, they're all over the place.

Interestingly, looks like there's been some confusion about this before:

#make sure that invalid format specifiers are handled correctly
#self.assertRaises(ValueError, t.strftime, "%e")
#self.assertRaises(ValueError, t.strftime, "%")
#self.assertRaises(ValueError, t.strftime, "%#")
#oh well, some systems just ignore those invalid ones.
#at least, exercise them to make sure that no crashes
#are generated
for f in ["%e", "%", "%#"]:
try:
t.strftime(f)
except ValueError:
pass


while ((ch = *pin++) != '\0') {
if (ch != '%') {
do {
Copy link
Member

@pganssle pganssle Nov 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... It's a bit hard to keep track of where null checking happens in this loop.

I guess the reason that trailing % was forbidden in the original version was that they wanted to check for null termination every time the pin pointer is advanced.

I think there's a change in behavior around null-checking here, because before this change, the new format string did not include the null terminator, after the change, the new format string does include the null terminator. I don't know that this is a problem, but I'm wary of it.

Maybe you can change it like this:

while((ch = *pin++) != '\0') {
    if (ch != '%' || (ch = *pin++) == '\0') {
        ptoappend = pin - 1;
        ntoappend = 1;
    }
    else if (ch == 'z') {
    }

That should keep the logic the same.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an aside, this terse C-style stuff where you have to keep track of null pointers is for the birds. It took me a long time to convince myself that there was a change in behavior, and even still I am not certain about it. It really feels weird to advance the pointer in the conditionals.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just looking at it, I don't think your code does what you think it does. Without a null char check at the end of the loop, you'll read the null char and advance the pointer, and then the check in the while loop condition will actually be reading the char after the null char.

Like you said though, this stuff is tricky, so I'm going to test it and report back.

Also, my first attempt at fixing this, while I think a bit uglier than the do/while solution, comes closer to your desired behavior then the current one. Here it is:

while ((ch = *pin++) != '\0')
{
if (ch != '%') {
ptoappend = pin - 1;
ntoappend = 1;
}
// else if ((ch = *pin++) == '\0') {
/* There's a lone trailing %; doesn't make sense. */
// PyErr_SetString(PyExc_ValueError, "strftime format "
// "ends with raw %");
// goto Done;
// ptoappend = pin - 2;
// ntoappend = 2;
// }
/* A % has been seen and ch is the character after it. */
else if ((ch = *pin++) == 'z') {
if (zreplacement == NULL) {
/* format utcoffset */
char buf[100];
PyObject *tzinfo = get_tzinfo_member(object);
zreplacement = PyBytes_FromStringAndSize("", 0);
if (zreplacement == NULL) goto Done;
if (tzinfo != Py_None && tzinfo != NULL) {
assert(tzinfoarg != NULL);
if (format_utcoffset(buf,
sizeof(buf),
"",
tzinfo,
tzinfoarg) < 0)
goto Done;
Py_DECREF(zreplacement);
zreplacement =
PyBytes_FromStringAndSize(buf,
strlen(buf));
if (zreplacement == NULL)
goto Done;
}
}
assert(zreplacement != NULL);
ptoappend = PyBytes_AS_STRING(zreplacement);
ntoappend = PyBytes_GET_SIZE(zreplacement);
}
else if (ch == 'Z') {
/* format tzname */
if (Zreplacement == NULL) {
Zreplacement = make_Zreplacement(object,
tzinfoarg);
if (Zreplacement == NULL)
goto Done;
}
assert(Zreplacement != NULL);
assert(PyUnicode_Check(Zreplacement));
ptoappend = PyUnicode_AsUTF8AndSize(Zreplacement,
&ntoappend);
if (ptoappend == NULL)
goto Done;
}
else if (ch == 'f') {
/* format microseconds */
if (freplacement == NULL) {
freplacement = make_freplacement(object);
if (freplacement == NULL)
goto Done;
}
assert(freplacement != NULL);
assert(PyBytes_Check(freplacement));
ptoappend = PyBytes_AS_STRING(freplacement);
ntoappend = PyBytes_GET_SIZE(freplacement);
}
else {
/* percent followed by neither z nor Z */
ptoappend = pin - 2;
ntoappend = 2;
}
/* Append the ntoappend chars starting at ptoappend to
* the new format.
*/
if (ntoappend == 0)
continue;
assert(ptoappend != NULL);
assert(ntoappend > 0);
while (usednew + ntoappend > totalnew) {
if (totalnew > (PY_SSIZE_T_MAX >> 1)) { /* overflow */
PyErr_NoMemory();
goto Done;
}
totalnew <<= 1;
if (_PyBytes_Resize(&newfmt, totalnew) < 0)
goto Done;
pnew = PyBytes_AsString(newfmt) + usednew;
}
memcpy(pnew, ptoappend, ntoappend);
pnew += ntoappend;
usednew += ntoappend;
assert(usednew <= totalnew);
printf("%s\n", PyUnicode_AsUTF8(PyObject_Repr(newfmt)));
if (ch == '\0')
break;
} /* end while() */

As far as the pointer juggling in the conditionals goes, it's a pattern I've seen before in C parsing code. I think ideally you only mutate the pointer in one place, as it quickly becomes hard to reason about.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I suspected, your code doesn't do what we want. Passing it "%D %" produces the format string b'%D \x00\xfb\xfb\xfb\xfb\xfb\xfb\xfb\xfb\xcb\xcb\xcb\xcb\xcb\xcb\xcb\xcb'.

I obtained this by calling printf("%s\n", PyUnicode_AsUTF8(PyObject_Repr(newfmt))); after the while loop. If you put the call in the loop itself, you can watch it happen:

b'%D\xcb\xcb\xcb'
b'%D \xcb\xcb'
b'%D \x00\xcb'
b'%D \x00\xfb'
b'%D \x00\xfb\xfb\xcb\xcb\xcb\xcb'
b'%D \x00\xfb\xfb\xfb\xcb\xcb\xcb'
b'%D \x00\xfb\xfb\xfb\xfb\xcb\xcb'
b'%D \x00\xfb\xfb\xfb\xfb\xfb\xcb'
b'%D \x00\xfb\xfb\xfb\xfb\xfb\xfb'
b'%D \x00\xfb\xfb\xfb\xfb\xfb\xfb\xfb\xcb\xcb\xcb\xcb\xcb\xcb\xcb\xcb\xcb'
b'%D \x00\xfb\xfb\xfb\xfb\xfb\xfb\xfb\xfb\xcb\xcb\xcb\xcb\xcb\xcb\xcb\xcb'

I'm not sure what's causing it to eventually break out of the loop and not spin forever; I assume it's one of the length checks that happens in the loop.

Copy link
Member

@pganssle pganssle Nov 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michaelsaah Good call, sorry, I didn't actually try my version, you are right that I advance but then never check if the next character is a NULL. I think maybe this will do what I was looking for?

while((ch = *pin++) != '\0') {
    if (ch != '%' || *pin == '\0') {
        ptoappend = pin - 1;
        ntoappend = 1;
    }
    else if ((ch = *pin++) == 'z') {
    }

Edit: No, just realized why this won't work. Hm... If I'm right that the behavior is slightly different, we may need to have more than one check in here, unfortunately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you check the earlier commit I linked to? That might be a blueprint for what you want. The key is the null check and break at the end of the loop.

@tirkarthi
Copy link
Member

@michaelsaah You might want to add a NEWS entry given the impact and it will also help in keeping the PR green : https://devguide.python.org/committing/#what-s-new-and-news-entries

@michaelsaah
Copy link
Contributor Author

@pganssle I think you'll like the latest change. Behavior w/ regard to null bytes now matches existing code, as far as I can tell.

@pganssle
Copy link
Member

pganssle commented Dec 3, 2018

@michaelsaah Sorry for the days of silence.

I think the behavior of this is now fixed. I'm not sure how I feel about decrementing the pointer, though. I think the two most obviously viable solutions (barring any that dramatically refactor the code, which I have not considered):

  1. Checking for a null at the beginning and the end of the loop
  2. Checking for a null at the beginning of the loop, but decrementing the pointer (as you've done in this case) when you find a second null

The first one is nice because it doesn't go through unnecessary iterations of the loop, but the second one keeps the null checking logic as close as possible to the point where the pointer is incremented, which keeps the amount of time between reaching the end of the string and knowing we've reached it to a minimum.

I think the only other decent option would be something like this:

while ((ch = *pin++) != '\0') {
    if(ch != '%') {
        ptoappend = pin - 1;
        ntoappend = 1;
    } else if (*(pin + 1) != '\0') {
        if ((ch = *pin++) == 'z') {
            ...

I am increasingly convincing myself that number 2 (your current version) is the right way to go, so I say we go ahead and merge this if no one else objects.

@pganssle
Copy link
Member

pganssle commented Dec 3, 2018

By the way, thank you for your patience with this @michaelsaah, particularly since I know I created some extra work for you with my ill-considered alternative approaches.

@michaelsaah
Copy link
Contributor Author

Wow we really commented past each other there. No worries about the extra work, you've been a very clear and thorough reviewer, and I'm doing this for fun while looking for a new job anyway.

I originally didn't want to decrement the pointer either, but I'm convinced it's safe, since we only do that if we've already seen a '%', which means we won't be decrementing past the beginning of the buffer.

After staring at this for hours over the past couple weeks, I'm not convinced that there's any way to make the control flow more intuitive while keeping the null-checking contained.

As far as the difference between (1) and (2), I think (2) fits more naturally with the case-checking semantics that are already there, but it's a toss-up. I don't think either is more confusing (or clearer) than the other.

@michaelsaah
Copy link
Contributor Author

As an aside, I don't see how my solution leads to unnecessary traversals of the loop. As soon as it sees a '%' followed by a '\0', it copies the '%' and breaks on the next while condition check. Do you mean that it breaks on the while condition instead of at the bottom of the loop body?

@pganssle
Copy link
Member

CC: @abalkin

I think this is ready to be merged.

@pablogsal pablogsal self-assigned this Jan 9, 2019
@vstinner vstinner merged commit 454b3d4 into python:master Jan 14, 2019
@miss-islington
Copy link
Contributor

Thanks @michaelsaah for the PR, and @vstinner for merging it 🌮🎉.. I'm working now to backport this PR to: 3.7.
🐍🍒⛏🤖

@bedevere-bot
Copy link

GH-11550 is a backport of this pull request to the 3.7 branch.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jan 14, 2019
…0692)

Previously, calling the strftime() method on a datetime object with a
trailing '%' in the format string would result in an exception. However,
this only occured when the datetime C module was being used; the python
implementation did not match this behavior. Datetime is now PEP-399
compliant, and will not throw an exception on a trailing '%'.
(cherry picked from commit 454b3d4)

Co-authored-by: MichaelSaah <[email protected]>
@vstinner vstinner changed the title bpo-35066 bpo-35066: _dateime.datetime.strftime copies trailing '%' Jan 14, 2019
miss-islington added a commit that referenced this pull request Jan 14, 2019
Previously, calling the strftime() method on a datetime object with a
trailing '%' in the format string would result in an exception. However,
this only occured when the datetime C module was being used; the python
implementation did not match this behavior. Datetime is now PEP-399
compliant, and will not throw an exception on a trailing '%'.
(cherry picked from commit 454b3d4)

Co-authored-by: MichaelSaah <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants