-
-
Notifications
You must be signed in to change notification settings - Fork 3k
Better support for string formatting #7418
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've about half finished the review and will do the rest tomorrow morning, but it looks good so far and the overall design seems solid.
This is a great feature, especially since this is how we represent F-strings.
mypy/checkstrformat.py
Outdated
return False | ||
|
||
|
||
def custom_special_method(typ: Type, name: str) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice generalization of the custom equality function but should probably live somewhere other than checkstrformat
. checker
might be a better place than checkexpr
where it was before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will make it a runtime import cycle, currently 'checkstrformat' only imports others under if TYPE_CHECKING: ...
. Maybe we should make a new module like checker_utils.py
?
mypy/checkstrformat.py
Outdated
StringFormatterChecker.check_str_interpolation() for printf-style % interpolation. | ||
|
||
Note that although at runtime format strings are parsed using custom parsers, | ||
here we use a regerx-based approach. This way we 99% match runtime behaviour while keeping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regex
[ARG_POS], | ||
[None]) | ||
[ARG_POS, ARG_POS], | ||
[None, None]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am somewhat confused how this every worked but oh well.
I guess they always get zipped and zip ends early?
See After https://docs.python.org/3/library/string.html#formatspec for | ||
specifications. The regexps are intentionally wider, to report better errors, | ||
instead of just not matching. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the regex/overmatching approach
mypy/checkstrformat.py
Outdated
self.msg.fail('Formatting nesting must be at most two levels deep', | ||
ctx, code=codes.STRING_FORMATTING) | ||
return None | ||
sub_conv_specs = self.parse_format_value(conv_spec.format_spec[1:], ctx=ctx, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the [1:]
for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original format spec includes the starting colon, so I remove it everywhere, but here it is indeed unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great.
mypy/checkstrformat.py
Outdated
expected_type: Type) -> None: | ||
# TODO: try refactoring to combine this logic with % formatting. | ||
if spec.type == 'c': | ||
if isinstance(repl, (StrExpr, BytesExpr)) and len(cast(StrExpr, repl).value) != 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This cast is going to crash with mypyc if it is a BytesExpr
. (Why is the cast needed?)
" use !r if this is a desired behavior", call, | ||
code=codes.STRING_FORMATTING) | ||
if spec.flags: | ||
numeric_types = UnionType([self.named_type('builtins.int'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this interact with things like decimal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tried and it is complicated, Decimal
supports some flags, but doesn't actually supports {:d}
, but still supports %d
. I will try to write some more precise rules.
|
||
def validate_and_transform_accessors(self, temp_ast: Expression, original_repl: Expression, | ||
spec: ConversionSpecifier, ctx: Context) -> bool: | ||
"""Validate and transform (in-place) format field accessors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explain what transforming is here?
Fixes #1444
Fixes #2639
Fixes #7114
This PR does three related things:
%
formattingbytes
vsstr
interactions (including those that are technically not errors)str.format()
callsThis also fixes few issues discovered by this PR during testing (notably a bug in f-string transformation), and adds/extends a bunch of docstring and comments to existing code. The implementation is mostly straightforward, there are few hacky things, but IMO nothing unreasonable.
Here are few comments:
str.format()
calls purely regexp-based (as for%
formatting), mostly because the former can be both nested and repeated (And we must support nested formatting because this is how we support f-strings). CPython itself uses a custom parser but it is huge, so I decided to have a mixed approach. This way we can keep code simple while still maintaining practically one-to-one compatibility with runtime behavior (the error messages are sometimes different however).'%s' % u'...'
results inunicode
, notstr
, on Python 2)'%s' % some_bytes
and/or'{:s}'.format(some_bytes)
on Python 3str.format()
call (probably because at runtime they are silently ignored)str
vsbytes
errors into a separate error code, because technically they are not type errors, just suspicious/dangerous code.