-
-
Notifications
You must be signed in to change notification settings - Fork 290
QuotedString unquote_results doesn't understand escaped whitespace #474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Ooops, forgot the version info... I've replicated this in pyparsing 3.0.9 (Python 3.7/3.10), 3.1.0a1 (Python 3.10) and 2.4.7 (Python 3.7). |
I'll look into this before the next release. |
Just want to confirm that you are not getting tripped up over the representation of backslashes in the output - that output is a backslash followed by a newline: >>> bslash = "\\"
>>> nl = "\n"
>>> print(repr(bslash + nl))
'\\\n' Here is more detail on the string returned from parsing with QuotedString: >>> import pyparsing as pp
>>> res = pp.QuotedString(quoteChar='"', escChar='\\').parse_string(r'"\\n"')
>>> res[0]
'\\\n'
>>> len(res[0])
2 You can also have more control over this by passing |
I'm going to add this unit test to the testUnit.py: def testQuotedStringUnquotesAndConvertWhitespaceEscapes(self):
test_string = r'"\\n"'
for test_parameters in (
(True, True, ['\\\n'], 2, '\\', '\n'),
(True, False, ['\\n'], 2, '\\', 'n'),
(False, False, ['"\\\\n"'], 5, '"', '\\'),
):
unquote_results, convert_ws_escapes, expected_list, expected_len, exp0, exp1 = test_parameters
with self.subTest(f"Testing with parameters {test_parameters}"):
qs_expr = pp.QuotedString(
quoteChar='"',
escChar='\\',
unquote_results=unquote_results,
convert_whitespace_escapes=convert_ws_escapes
)
self.assertParseAndCheckList(
qs_expr,
test_string,
expected_list
)
result = qs_expr.parse_string(test_string)
# display individual characters
print(list(result[0]))
self.assertEqual(expected_len, len(result[0]))
self.assertEqual(exp0, result[0][0])
self.assertEqual(exp1, result[0][1])
print() which currently gives these results:
I'm pretty sure these are the desired results. |
To confirm, I was expecting parsing of the string (True, True, [r'\\n'], 2, '\\', 'n') EDIT: I messed up the backslashes the first time around... |
I've redone the test to make the expected results for each case clearer, and added two other test strings. I've made the input strings as explicit as I could by using f-strings - you can check that they are equivalent to the r-strings in the respective comments. There are no There is no def testQuotedStringUnquotesAndConvertWhitespaceEscapes(self):
#fmt: off
backslash = chr(92) # a single backslash
tab = "\t"
newline = "\n"
test_string_0 = f'"{backslash}{backslash}n"' # r"\\n"
test_string_1 = f'"{backslash}t{backslash}{backslash}n"' # r"\t\\n"
test_string_2 = f'"a{backslash}tb"' # r"a\tb"
T, F = True, False # these make the test cases format nicely
for test_parameters in (
# Parameters are the arguments to creating a QuotedString
# and the expected parsed list of characters):
# - unquote_results
# - convert_whitespace_escapes
# - test string
# - expected parsed characters (broken out as separate
# list items (all those doubled backslashes make it
# difficult to interpret the output)
(T, T, test_string_0, [backslash, newline]),
(T, F, test_string_0, [backslash, "n"]),
(F, F, test_string_0, ['"', backslash, backslash, "n", '"']),
(T, T, test_string_1, [tab, backslash, newline]),
(T, F, test_string_1, ["t", backslash, "n"]),
(F, F, test_string_1, ['"', backslash, "t", backslash, backslash, "n", '"']),
(T, T, test_string_2, ["a", tab, "b"]),
(T, F, test_string_2, ["a", "t", "b"]),
(F, F, test_string_2, ['"', "a", backslash, "t", "b", '"']),
):
unquote_results, convert_ws_escapes, test_string, expected_list = test_parameters
with self.subTest(msg=f"Testing with parameters {test_parameters}"):
print(f"unquote_results: {unquote_results}"
f"\nconvert_whitespace_escapes: {convert_ws_escapes}")
qs_expr = pp.QuotedString(
quoteChar='"',
escChar='\\',
unquote_results=unquote_results,
convert_whitespace_escapes=convert_ws_escapes
)
result = qs_expr.parse_string(test_string)
# do this instead of assertParserAndCheckList to explicitly
# check and display the separate items in the list
print("Results:")
control_chars = {newline: "<NEWLINE>", backslash: "<BACKSLASH>", tab: "<TAB>"}
print(f"[{', '.join(control_chars.get(c, repr(c)) for c in result[0])}]")
self.assertEqual(expected_list, list(result[0]))
print()
#fmt: on With these results:
|
That's a good idea making constants, very clear now! For this case: (T, T, test_string_0, [backslash, newline]), I would still expect the correct results to be Similarly for: (T, T, test_string_1, [tab, backslash, newline]), I would expect the output to be |
Ok, I'm coming around to these changes. Here is the new set of tests: def testQuotedStringUnquotesAndConvertWhitespaceEscapes(self):
#fmt: off
backslash = chr(92) # a single backslash
tab = "\t"
newline = "\n"
test_string_0 = f'"{backslash}{backslash}n"' # r"\\n"
test_string_1 = f'"{backslash}t{backslash}{backslash}n"' # r"\t\\n"
test_string_2 = f'"a{backslash}tb"' # r"a\tb"
test_string_3 = f'"{backslash}{backslash}{backslash}n"' # r"\\\n"
T, F = True, False # these make the test cases format nicely
for test_parameters in (
# Parameters are the arguments to creating a QuotedString
# and the expected parsed list of characters):
# - unquote_results
# - convert_whitespace_escapes
# - test string
# - expected parsed characters (broken out as separate
# list items (all those doubled backslashes make it
# difficult to interpret the output)
(T, T, test_string_0, [backslash, "n"]),
(T, F, test_string_0, [backslash, "n"]),
(F, F, test_string_0, ['"', backslash, backslash, "n", '"']),
(T, T, test_string_1, [tab, backslash, "n"]),
(T, F, test_string_1, ["t", backslash, "n"]),
(F, F, test_string_1, ['"', backslash, "t", backslash, backslash, "n", '"']),
(T, T, test_string_2, ["a", tab, "b"]),
(T, F, test_string_2, ["a", "t", "b"]),
(F, F, test_string_2, ['"', "a", backslash, "t", "b", '"']),
(T, T, test_string_3, [backslash, newline]),
(T, F, test_string_3, [backslash, "n"]),
(F, F, test_string_3, ['"', backslash, backslash, backslash, "n", '"']),
): with these results
This is a slightly breaking change, but I feel that this logic is more intuitive - instead of going through and converting whitespace markers first, and then going back and processing escapes, the code now just works left to right through the quoted string contents, using a little state machine to process backslashes and whatever following character there might be. |
Nice! As far as I can see that all looks like what I'd expect :) |
It seems that when using a
QuotedString
withunquote_results=True
(the default), it will incorrectly expand escaped whitespace characters.For example:
Actual:
['\\\n']
Expected:
['\\n']
It works fine if I pass
unquote_results=False
(with the obvious downside of not unquoting the results...):gives
['"\\\\n"']
The text was updated successfully, but these errors were encountered: