Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Better support for string formatting #7418

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 39 commits into from
Sep 24, 2019

Conversation

ilevkivskyi
Copy link
Member

@ilevkivskyi ilevkivskyi commented Aug 29, 2019

Fixes #1444
Fixes #2639
Fixes #7114

This PR does three related things:

  • Fixes some corner cases in % formatting
  • Tightens bytes vs str interactions (including those that are technically not errors)
  • Adds type checking for str.format() calls

This also fixes few issues discovered by this PR during testing (notably a bug in f-string transformation), and adds/extends a bunch of docstring and comments to existing code. The implementation is mostly straightforward, there are few hacky things, but IMO nothing unreasonable.

Here are few comments:

  • It was hard to keep the approach to str.format() calls purely regexp-based (as for % formatting), mostly because the former can be both nested and repeated (And we must support nested formatting because this is how we support f-strings). CPython itself uses a custom parser but it is huge, so I decided to have a mixed approach. This way we can keep code simple while still maintaining practically one-to-one compatibility with runtime behavior (the error messages are sometimes different however).
  • This causes few hundreds of errors internally, I am not sure what to do with these. Most popular ones are:
    • Unicode upcast ('%s' % u'...' results in unicode, not str, on Python 2)
    • Using '%s' % some_bytes and/or '{:s}'.format(some_bytes) on Python 3
    • Unused arguments in str.format() call (probably because at runtime they are silently ignored)
  • I added new error code for string interpolation/formatting and used it everywhere in old and new code. Potentially we might split out the str vs bytes errors into a separate error code, because technically they are not type errors, just suspicious/dangerous code.

Copy link
Collaborator

@msullivan msullivan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've about half finished the review and will do the rest tomorrow morning, but it looks good so far and the overall design seems solid.

This is a great feature, especially since this is how we represent F-strings.

return False


def custom_special_method(typ: Type, name: str) -> bool:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice generalization of the custom equality function but should probably live somewhere other than checkstrformat. checker might be a better place than checkexpr where it was before?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will make it a runtime import cycle, currently 'checkstrformat' only imports others under if TYPE_CHECKING: .... Maybe we should make a new module like checker_utils.py?

StringFormatterChecker.check_str_interpolation() for printf-style % interpolation.

Note that although at runtime format strings are parsed using custom parsers,
here we use a regerx-based approach. This way we 99% match runtime behaviour while keeping
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regex

[ARG_POS],
[None])
[ARG_POS, ARG_POS],
[None, None])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am somewhat confused how this every worked but oh well.
I guess they always get zipped and zip ends early?

See After https://docs.python.org/3/library/string.html#formatspec for
specifications. The regexps are intentionally wider, to report better errors,
instead of just not matching.
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the regex/overmatching approach

self.msg.fail('Formatting nesting must be at most two levels deep',
ctx, code=codes.STRING_FORMATTING)
return None
sub_conv_specs = self.parse_format_value(conv_spec.format_spec[1:], ctx=ctx,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the [1:] for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original format spec includes the starting colon, so I remove it everywhere, but here it is indeed unnecessary.

Copy link
Collaborator

@msullivan msullivan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great.

expected_type: Type) -> None:
# TODO: try refactoring to combine this logic with % formatting.
if spec.type == 'c':
if isinstance(repl, (StrExpr, BytesExpr)) and len(cast(StrExpr, repl).value) != 1:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cast is going to crash with mypyc if it is a BytesExpr. (Why is the cast needed?)

" use !r if this is a desired behavior", call,
code=codes.STRING_FORMATTING)
if spec.flags:
numeric_types = UnionType([self.named_type('builtins.int'),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this interact with things like decimal?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried and it is complicated, Decimal supports some flags, but doesn't actually supports {:d}, but still supports %d. I will try to write some more precise rules.


def validate_and_transform_accessors(self, temp_ast: Expression, original_repl: Expression,
spec: ConversionSpecifier, ctx: Context) -> bool:
"""Validate and transform (in-place) format field accessors.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain what transforming is here?

@ilevkivskyi ilevkivskyi merged commit 5b3346f into python:master Sep 24, 2019
@ilevkivskyi ilevkivskyi deleted the strict-formatting branch September 24, 2019 15:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

'%d' % SupportsInt fails Mypy complains about os.environ in string formatting Type check str format calls
2 participants