Better support for string formatting #7418

ilevkivskyi · 2019-08-29T18:17:09Z

This PR does three related things:

Fixes some corner cases in % formatting
Tightens bytes vs str interactions (including those that are technically not errors)
Adds type checking for str.format() calls

This also fixes few issues discovered by this PR during testing (notably a bug in f-string transformation), and adds/extends a bunch of docstring and comments to existing code. The implementation is mostly straightforward, there are few hacky things, but IMO nothing unreasonable.

Here are few comments:

It was hard to keep the approach to str.format() calls purely regexp-based (as for % formatting), mostly because the former can be both nested and repeated (And we must support nested formatting because this is how we support f-strings). CPython itself uses a custom parser but it is huge, so I decided to have a mixed approach. This way we can keep code simple while still maintaining practically one-to-one compatibility with runtime behavior (the error messages are sometimes different however).
This causes few hundreds of errors internally, I am not sure what to do with these. Most popular ones are:
- Unicode upcast ('%s' % u'...' results in unicode, not str, on Python 2)
- Using '%s' % some_bytes and/or '{:s}'.format(some_bytes) on Python 3
- Unused arguments in str.format() call (probably because at runtime they are silently ignored)
I added new error code for string interpolation/formatting and used it everywhere in old and new code. Potentially we might split out the str vs bytes errors into a separate error code, because technically they are not type errors, just suspicious/dangerous code.

msullivan

I've about half finished the review and will do the rest tomorrow morning, but it looks good so far and the overall design seems solid.

This is a great feature, especially since this is how we represent F-strings.

msullivan · 2019-08-29T20:56:19Z

mypy/checkstrformat.py

+    return False
+
+
+def custom_special_method(typ: Type, name: str) -> bool:


This is a nice generalization of the custom equality function but should probably live somewhere other than checkstrformat. checker might be a better place than checkexpr where it was before?

This will make it a runtime import cycle, currently 'checkstrformat' only imports others under if TYPE_CHECKING: .... Maybe we should make a new module like checker_utils.py?

msullivan · 2019-08-29T21:05:45Z

mypy/checkstrformat.py

+StringFormatterChecker.check_str_interpolation() for printf-style % interpolation.
+
+Note that although at runtime format strings are parsed using custom parsers,
+here we use a regerx-based approach. This way we 99% match runtime behaviour while keeping


msullivan · 2019-08-29T21:06:56Z

mypy/fastparse.py

-                                     [ARG_POS],
-                                     [None])
+                                     [ARG_POS, ARG_POS],
+                                     [None, None])


I am somewhat confused how this every worked but oh well.
I guess they always get zipped and zip ends early?

msullivan · 2019-08-29T21:38:57Z

mypy/checkstrformat.py

+    See After https://docs.python.org/3/library/string.html#formatspec for
+    specifications. The regexps are intentionally wider, to report better errors,
+    instead of just not matching.
+    """


I agree with the regex/overmatching approach

msullivan · 2019-08-29T21:56:01Z

mypy/checkstrformat.py

+                    self.msg.fail('Formatting nesting must be at most two levels deep',
+                                  ctx, code=codes.STRING_FORMATTING)
+                    return None
+                sub_conv_specs = self.parse_format_value(conv_spec.format_spec[1:], ctx=ctx,


What is the [1:] for?

The original format spec includes the starting colon, so I remove it everywhere, but here it is indeed unnecessary.

msullivan

Looks great.

msullivan · 2019-08-30T19:16:02Z

mypy/checkstrformat.py

+                                      expected_type: Type) -> None:
+        # TODO: try refactoring to combine this logic with % formatting.
+        if spec.type == 'c':
+            if isinstance(repl, (StrExpr, BytesExpr)) and len(cast(StrExpr, repl).value) != 1:


This cast is going to crash with mypyc if it is a BytesExpr. (Why is the cast needed?)

msullivan · 2019-08-30T19:17:54Z

mypy/checkstrformat.py

+                                  " use !r if this is a desired behavior", call,
+                                  code=codes.STRING_FORMATTING)
+        if spec.flags:
+            numeric_types = UnionType([self.named_type('builtins.int'),


How does this interact with things like decimal?

I just tried and it is complicated, Decimal supports some flags, but doesn't actually supports {:d}, but still supports %d. I will try to write some more precise rules.

msullivan · 2019-08-30T20:00:21Z

mypy/checkstrformat.py

+
+    def validate_and_transform_accessors(self, temp_ast: Expression, original_repl: Expression,
+                                         spec: ConversionSpecifier, ctx: Context) -> bool:
+        """Validate and transform (in-place) format field accessors.


Explain what transforming is here?

Ivan Levkivskyi and others added 24 commits August 26, 2019 14:10

Add couple corner cases to old style string interpolation

9831e46

The corner cases only apply to str

1efb629

Write regexps for format strings

5085dbb

Couple more special cases and refactor

31945dd

Better mapping types for % formatting

bcadd31

Fix self-check; extend a docstring

fd2551c

Some infrastructure; add error codes

c144e73

Start implementing the core logic

33e0a38

Fix error message

fc58ca5

Add few more checks

f484f42

Fix escaping

83801b1

Start adding tests

9d1d672

Fix lint

ea8e852

Start switching to the new mixed logic

7f1f7c1

Fully switch to mixed logic

5388c5c

Minor issues

7ccc46b

Minor fixes

36e1379

Add some tests

79bd948

More tests; minor fixes

a5b325c

Last bunch of tests

d38c214

Fix lint and self-check; a bit more docs

3462edd

Merge remote-tracking branch 'upstream/master' into strict-formatting

7466da9

Fix some special cases

57ae401

Don't touch the dict.pyi fixture

950aaf8

ilevkivskyi requested review from msullivan and JukkaL August 29, 2019 18:17

Ivan Levkivskyi added 4 commits August 29, 2019 20:27

Remove redundant format argument

be65ae5

Replace None with '' for optional groups where it doesn't really matter

7a2158e

Work around typeshed issue

8000a81

One forgotten or ''

85c639c

Remove fixed TODO item

1547258

msullivan reviewed Aug 29, 2019

View reviewed changes

ilevkivskyi mentioned this pull request Aug 30, 2019

Add an error warning for calling str(x) where x can be bytes #7432

Closed

msullivan approved these changes Aug 30, 2019

View reviewed changes

Ivan Levkivskyi added 4 commits August 30, 2019 23:09

Address some part of CR

5ca256a

Address the rest of CR; tighten numeric types logic

40f6d13

Merge remote-tracking branch 'upstream/master' into strict-formatting

f20a90f

Merge remote-tracking branch 'upstream/master' into strict-formatting

47de960

ilevkivskyi mentioned this pull request Sep 5, 2019

Generating error if callable object is formatted as a string? #5213

Open

Get rid of mypyc warning

be5f75c

ilevkivskyi mentioned this pull request Sep 6, 2019

Use empty outer context when checking str.format() calls #7479

Open

Ivan Levkivskyi and others added 5 commits September 10, 2019 17:00

Merge remote-tracking branch 'upstream/master' into strict-formatting

65c038b

Merge remote-tracking branch 'upstream/master' into strict-formatting

2138143

Fix merge

6a28699

Merge branch 'master' into strict-formatting

3156e72

Fix lint

8463211

ilevkivskyi merged commit 5b3346f into python:master Sep 24, 2019

ilevkivskyi deleted the strict-formatting branch September 24, 2019 15:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Better support for string formatting #7418

Better support for string formatting #7418

Uh oh!

ilevkivskyi commented Aug 29, 2019 •

edited

Loading

Uh oh!

msullivan left a comment

Uh oh!

msullivan Aug 29, 2019

Uh oh!

ilevkivskyi Aug 31, 2019

Uh oh!

msullivan Aug 29, 2019

Uh oh!

msullivan Aug 29, 2019

Uh oh!

msullivan Aug 29, 2019

Uh oh!

msullivan Aug 29, 2019

Uh oh!

ilevkivskyi Aug 31, 2019

Uh oh!

msullivan left a comment

Uh oh!

msullivan Aug 30, 2019

Uh oh!

msullivan Aug 30, 2019

Uh oh!

ilevkivskyi Aug 30, 2019

Uh oh!

msullivan Aug 30, 2019

Uh oh!

Uh oh!

		return False


		def custom_special_method(typ: Type, name: str) -> bool:

Uh oh!

Better support for string formatting #7418

Better support for string formatting #7418

Uh oh!

Conversation

ilevkivskyi commented Aug 29, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

msullivan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

msullivan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ilevkivskyi commented Aug 29, 2019 •

edited

Loading