Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Custom search command support for multibyte characters in Python 3 #341

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Sep 9, 2020

Conversation

amysutedja
Copy link
Contributor

Fixes #288

SearchCommand supports multibyte characters in Python 3

Previously, SearchCommand in Python 3 would read directly from the incoming ifile stream -- typically sys.stdin. In Python 2 sys.stdin is a file-like byte stream, whereas in Python 3 it is an io.TextIOWrapper containing an underlying buffer. Because this object reads by character rather than by byte, multibyte characters would cause the command to read too far past the data's boundary. This could lead to corrupt data reads (if early in the stream) or infinite hangs (if at the end of the stream).

We now retrieve the underlying buffer and read from it when in Python 3. The read bytes are then cast to strings for parsing purposes.

Tests ensure underlying byte stream

Previously, the tests defined a metadata stream with the character in it (not to be confused with A). In Python 2, this character caused its containing string to become unicode, which caused StringIO to gain that encoding. As a result, the size of the metadata stream was always incorrectly measuring Unicode characters rather than bytes, but under test the read logic would always be handed a Unicode character stream rather than a byte stream.

This has been fixed.

Multibyte test fixture

We now have a new test test_multibyte_chunked which contains a multibyte character test fixture.

@amysutedja amysutedja requested a review from zenmoto September 4, 2020 02:50
@ichaer-splunk
Copy link

Hah, I've been working since yesterday in a change that is nearly identical to this. Nice fix, if I say so myself =D

I'll retire my fork, this is really good.

Copy link
Contributor

@aij aij left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this so quickly!

I added a couple comments, but nothing worth blocking on.

@tristan-splunk
Copy link

FYI we are waiting on this change to upgrade the pythonsdk in all our apps ahead of conf. We'd rather not have to do a monkey patch. Please advise when you can merge and release an official build.

@amysutedja amysutedja merged commit 181cbfe into master Sep 9, 2020
@amysutedja amysutedja deleted the csc-multibyte branch September 9, 2020 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

StreamingCommand failed when input contains non-ascii character
5 participants