-
Notifications
You must be signed in to change notification settings - Fork 378
Custom search command support for multibyte characters in Python 3 #341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hah, I've been working since yesterday in a change that is nearly identical to this. Nice fix, if I say so myself =D I'll retire my fork, this is really good. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this so quickly!
I added a couple comments, but nothing worth blocking on.
14040df
to
8dec167
Compare
5833b9f
to
6dc4cf7
Compare
6dc4cf7
to
5d5eb8a
Compare
FYI we are waiting on this change to upgrade the pythonsdk in all our apps ahead of conf. We'd rather not have to do a monkey patch. Please advise when you can merge and release an official build. |
Fixes #288
SearchCommand
supports multibyte characters in Python 3Previously,
SearchCommand
in Python 3 would read directly from the incomingifile
stream -- typicallysys.stdin
. In Python 2sys.stdin
is a file-like byte stream, whereas in Python 3 it is anio.TextIOWrapper
containing an underlying buffer. Because this object reads by character rather than by byte, multibyte characters would cause the command to read too far past the data's boundary. This could lead to corrupt data reads (if early in the stream) or infinite hangs (if at the end of the stream).We now retrieve the underlying buffer and read from it when in Python 3. The read bytes are then cast to strings for parsing purposes.
Tests ensure underlying byte stream
Previously, the tests defined a metadata stream with the
A
character in it (not to be confused withA
). In Python 2, this character caused its containing string to becomeunicode
, which causedStringIO
to gain that encoding. As a result, the size of the metadata stream was always incorrectly measuring Unicode characters rather than bytes, but under test the read logic would always be handed a Unicode character stream rather than a byte stream.This has been fixed.
Multibyte test fixture
We now have a new test
test_multibyte_chunked
which contains a multibyte character test fixture.