Custom search command support for multibyte characters in Python 3 #341

amysutedja · 2020-09-04T01:34:16Z

Fixes #288

`SearchCommand` supports multibyte characters in Python 3

Previously, SearchCommand in Python 3 would read directly from the incoming ifile stream -- typically sys.stdin. In Python 2 sys.stdin is a file-like byte stream, whereas in Python 3 it is an io.TextIOWrapper containing an underlying buffer. Because this object reads by character rather than by byte, multibyte characters would cause the command to read too far past the data's boundary. This could lead to corrupt data reads (if early in the stream) or infinite hangs (if at the end of the stream).

We now retrieve the underlying buffer and read from it when in Python 3. The read bytes are then cast to strings for parsing purposes.

Tests ensure underlying byte stream

Previously, the tests defined a metadata stream with the Ａ character in it (not to be confused with A). In Python 2, this character caused its containing string to become unicode, which caused StringIO to gain that encoding. As a result, the size of the metadata stream was always incorrectly measuring Unicode characters rather than bytes, but under test the read logic would always be handed a Unicode character stream rather than a byte stream.

This has been fixed.

Multibyte test fixture

We now have a new test test_multibyte_chunked which contains a multibyte character test fixture.

ichaer-splunk · 2020-09-04T19:20:10Z

Hah, I've been working since yesterday in a change that is nearly identical to this. Nice fix, if I say so myself =D

I'll retire my fork, this is really good.

aij

Thanks for fixing this so quickly!

I added a couple comments, but nothing worth blocking on.

splunklib/searchcommands/search_command.py

tests/searchcommands/test_search_command.py

tristan-splunk · 2020-09-08T16:44:32Z

FYI we are waiting on this change to upgrade the pythonsdk in all our apps ahead of conf. We'd rather not have to do a monkey patch. Please advise when you can merge and release an official build.

zenmoto and others added 3 commits September 3, 2020 16:36

a test to confirm errors in searchcommands with multibyte data

6a9c77b

Fix for multibyte inputs to custom search commands

0780fc1

Multibyte test cleanup

253784c

amysutedja requested a review from zenmoto September 4, 2020 02:50

aij approved these changes Sep 4, 2020

View reviewed changes

splunklib/searchcommands/search_command.py Outdated Show resolved Hide resolved

tests/searchcommands/test_search_command.py Outdated Show resolved Hide resolved

Cleaner naming and semantics for custom search command input streams

8dec167

amysutedja force-pushed the csc-multibyte branch from 14040df to 8dec167 Compare September 4, 2020 20:43

aij approved these changes Sep 4, 2020

View reviewed changes

amysutedja force-pushed the csc-multibyte branch from 5833b9f to 6dc4cf7 Compare September 4, 2020 21:30

Better type assertion

5d5eb8a

amysutedja force-pushed the csc-multibyte branch from 6dc4cf7 to 5d5eb8a Compare September 4, 2020 21:31

zenmoto approved these changes Sep 4, 2020

View reviewed changes

zenmoto and others added 3 commits September 8, 2020 14:25

add test for v1 search commands

27f92a5

add test data for previous test

f15390b

Version 1.6.14

37077d6

amysutedja merged commit 181cbfe into master Sep 9, 2020

amysutedja deleted the csc-multibyte branch September 9, 2020 18:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Custom search command support for multibyte characters in Python 3 #341

Custom search command support for multibyte characters in Python 3 #341

Uh oh!

amysutedja commented Sep 4, 2020

Uh oh!

ichaer-splunk commented Sep 4, 2020

Uh oh!

aij left a comment

Uh oh!

Uh oh!

Uh oh!

tristan-splunk commented Sep 8, 2020

Uh oh!

Uh oh!

Custom search command support for multibyte characters in Python 3 #341

Custom search command support for multibyte characters in Python 3 #341

Uh oh!

Conversation

amysutedja commented Sep 4, 2020

SearchCommand supports multibyte characters in Python 3

Tests ensure underlying byte stream

Multibyte test fixture

Uh oh!

ichaer-splunk commented Sep 4, 2020

Uh oh!

aij left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tristan-splunk commented Sep 8, 2020

Uh oh!

Uh oh!

`SearchCommand` supports multibyte characters in Python 3