Thanks to visit codestin.com
Credit goes to github.com

Skip to content

py/compile: Implement PEP 572, assignment expressions. #4908

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 16, 2020

Conversation

dpgeorge
Copy link
Member

@dpgeorge dpgeorge commented Jul 8, 2019

This PR implements PEP 572, assignment expressions.

Note: this is here primarily for proof of concept and discussion. There's no intention to merge it at this point. See #4899 for the discussion.

The patch here may not be 100% compliant with PEP 572, I didn't check everything, but the basic cases work:

(x:=4)
if x:=2: print(True)
min(4, x:=5)

@nickovs
Copy link
Contributor

nickovs commented Jul 8, 2019

That code is remarkably compact and simple! There does seem to be a issue though with the scoping of the target when used in list comprehension, which is a case that is explicitly called out as a special case in the PEP. The spec states:

There is one special case: an assignment expression occurring in a list, set or dict comprehension or in a generator expression (below collectively referred to as "comprehensions") binds the target in the containing scope, honoring a nonlocal or global declaration for the target in that scope, if one exists.

This is necessary to make 'witness' variables work. As an example of the current failure:

>>> any((hit := i) % 5 == 3 and (hit % 2) == 0 for i in range(6))
False
>>> any((hit := i) % 5 == 3 and (hit % 2) == 0 for i in range(10))
True
>>> hit
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'hit' isn't defined
>>>

The hit variable can be accessed from inside the comprehension but not from outside. So, it looks like there needs to be some special code to handle the special case...

Otherwise this seems to work as expected for everything else that I've tried.

@nickovs
Copy link
Contributor

nickovs commented Jul 9, 2019

While I don't claim any expertise in the internals of the compiler, I wrote a fix that seems to solve the problem that I identified above without breaking anything else.

@dpgeorge
Copy link
Member Author

dpgeorge commented Jul 9, 2019

While I don't claim any expertise in the internals of the compiler, I wrote a fix that seems to solve the problem that I identified above without breaking anything else.

I was also just working on it. My version seems functionally equivalent to yours (use parent's scope). See commit in PR.

I also added some tests.

@nickovs
Copy link
Contributor

nickovs commented Jul 9, 2019

Your version looks a lot cleaner than mine!

@nickovs
Copy link
Contributor

nickovs commented Jul 9, 2019

It might be worth throwing each of the edge-case examples from the PEP into the tests to ensure compatibility with C Python.

@nickovs
Copy link
Contributor

nickovs commented Jul 9, 2019

I just spotted that your current patch does not disallow the assignment to the target of a for in a comprehension (which is explicitly disallowed by the PEP:

However, an assignment expression target name cannot be the same as a for-target name appearing in any comprehension containing the assignment expression.

Thus the following should all fail:

i := i+1 for i in range(5)]
[[(j := j) for i in range(5)] for j in range(5)]
[i := 0 for i, j in stuff]

It would appear that this can easily be fixed by updating your test for being inside a comprehension to check if the target is local to the scope that we are trying to avoid writing to when inside a comprehension. Thus my patched code reads:

    if (SCOPE_LIST_COMP <= comp->scope_cur->kind && comp->scope_cur->kind <= SCOPE_GEN_EXPR) {
        // Use parent's scope for assigned value so it can "escape"
        if (scope_find(old_scope, arg)) {
	    compile_syntax_error(comp, pn_name, "Can't assign to a loop variable");
	    return;
	}    
        comp->scope_cur = comp->scope_cur->parent;
    }

@nickovs
Copy link
Contributor

nickovs commented Jul 9, 2019

Actually my fix seems to break some other cases. For instance it flags:

[((m := k+1), k * m) for k in range(5)]

as attempting to assign to a loop variable when in fact it is perfectly legal, so the check might need to be a little more complex.

I guess there is a question about the extent to which you want to catch all invalid cases as opposed to just behaving in a poorly defined manner.

@dpgeorge
Copy link
Member Author

Thus the following should all fail:

Actually, the first and third of these are fine. And the second is fine if j is already a global (defined in the outer scope) (all according to CPython).

I pushed some new tests along with the .exp expected output, which was obtained from the latest master CPython (Python 3.9.0a0). There may be that the PEP itself is not 100% accurate, or maybe the wording is not precise enough.

@pfalcon
Copy link
Contributor

pfalcon commented Jul 10, 2019

I'd suggest to look into making this feature optional (no matter how "small" its implementation seems).

I also find it strange that such features are being worked on, whereas there's no complete/consistent support for more established Python features, e.g. variable annotations a : int = 1.

@stinos
Copy link
Contributor

stinos commented Jul 10, 2019

more established Python features, e.g. variable annotations

It's also a matter of usefulness and of all things not implemented in MicroPython I'm not sure how to rate annotations on that front. Annotations or not, Python remains dynamically typed so in the end it's extra text which doesn't do much and might as well end up conflicting with what actually goes on. Don't get me wrong, I see the use in annotating types, but something like assignment expressions seems a bit more useful then (less text and actually does something) and I don't think I'm alone with that. Purely anecdotal but I've probably seen already more Python code using it then I've seen annotated code.

@nickovs
Copy link
Contributor

nickovs commented Jul 10, 2019

@pfalcon I can' speak for Damien but my motivation for working on this rather than annotations is much as @stinos suggests; this seems like a way to reduce source and byte code size. Annotation on the other hand, while useful, necessarily increases the size of both. That seems at odds with being 'micro'. For my part I also support this feature because in the vast majority of cases it improved code readability (although just like any language feature it can be contrived to be used to obfuscate). IMHO making code more understandable is at least as likely to reduce bugs as annotating the types, especially in the absence of a dynamic type checker.

py/compile.c Outdated
@@ -2115,7 +2115,13 @@ STATIC void compile_namedexpr_helper(compiler_t *comp, mp_parse_node_t pn_name,
qstr arg = MP_PARSE_NODE_LEAF_ARG(pn_name);
compile_node(comp, pn_expr);
EMIT(dup_top);
scope_t *old_scope = comp->scope_cur;
if (SCOPE_LIST_COMP <= comp->scope_cur->kind && comp->scope_cur->kind <= SCOPE_GEN_EXPR) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The correctness of this condition is dependent on the exact ordering of some enum values in another file, which makes me fear that someone will break it down the line. Maybe it's a stylistic thing but I think it would make sense to define a macro called something like SCOPE_IS_COMP_LIKE, next to where you define SCOPE_IS_FUNC_LIKE and the enum itself in scope.h, that encompasses the correct condition and then use that macro here. That way if one day someone changes the values in the enum they will know to update those adjacent macros.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make sense to define a macro called something like SCOPE_IS_COMP_LIKE, next to where you define SCOPE_IS_FUNC_LIKE

Yes I completely agree.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now done.

@robert-hh
Copy link
Contributor

While assignment expressions look sleek an elegant in code - and I used them quite often in C -, under the hood I do not see that they do safe that much in actual code. For the uses in loops and conditions, still the expression has to be evaluated and the value assigned, a loop has to be entered and eventually left value.
And using it makes the code intention a little bit less obvious, because the assignment stand not by itself. That may be one of the reasons why it is forbidden in safety critical code.

@nickovs
Copy link
Contributor

nickovs commented Jul 10, 2019

Thus the following should all fail:

[i := i+1 for i in range(5)]
[[(j := j) for i in range(5)] for j in range(5)]
[i := 0 for i, j in stuff]

Actually, the first and third of these are fine. And the second is fine if j is already a global (defined in the outer scope) (all according to CPython).

While it may well be the case that these all work on the current CPython implementation, the spec specifically calls out these three examples as being invalid. In each case the semantics of the for statement insist that the targets of the for are local to the comprehension while the semantics of the assignment expression insist that the scope of the target is not local to the comprehension. If these cases 'work' CPython then that's fine; we can just be 'bug compatible' with the canonical implementation!

@dpgeorge
Copy link
Member Author

under the hood I do not see that they do safe that much in actual code

From a bytecode perspective the savings come in not needing to re-lookup the just-bound variable when it's used in the subsequent expression. For globals this can be mildly significant, saving 2 bytes in the bytecode and a dict lookup. For nonlocals also a small saving. For locals it would likely be on par. And for use in list comprehensions it might save a bit more.

@dpgeorge
Copy link
Member Author

If these cases 'work' CPython then that's fine; we can just be 'bug compatible' with the canonical implementation!

It might be worth raising this with some CPython devs.

@pfalcon
Copy link
Contributor

pfalcon commented Jul 11, 2019

@stinos

and of all things not implemented in MicroPython I'm not sure how to rate annotations on that front.

I'd say it's very simple to rate annotation implementation in MicroPython: they're supported for function arguments and not supported for variables/class members. That's somewhat inconsistent and incomplete, what I point to.

my motivation for working on this rather than annotations is much as @stinos suggests; this seems like a way to reduce source and byte code size.

I'm afraid this convoluted syntax which wasn't in Python for 20+ years, can improve binary efficiency only so much. Any decent optimizer would do much more optimization even without that convoluted syntax. But for decent optimization, more information about type constraints would be needed, and not surprisingly, we're talking about consistent implementation of type annotations.

As for "reducing source size", I don't even know what to say. APL is still out there. And I wonder, if you missed TinyPy. That thing promised "pretty full Python implementation in 64K", and after looking at the scarcity of whitespace in its sources, with codes like if a>b:c=foo(a,1)+b+c;print a;fun(f,g), you'd get an idea that its author means "64K of source".

In the meantime, Python programmers won't be able to read ":=" syntax at all (for a while), because again, it wasn't Python syntax for decades.

Annotation on the other hand, while useful, necessarily increases the size of both.

I'm happy to report that MicroPython just ignores annotations, so they don't affect bytecode size.

especially in the absence of a dynamic type checker.

Python (the language, the community) does have various type checkers, compilers, etc. Each particular implementation doesn't need to invent everything on its own.

@stinos
Copy link
Contributor

stinos commented Jul 11, 2019

and of all things not implemented in MicroPython I'm not sure how to rate annotations on that front.

I'd say it's very simple to rate annotation implementation in MicroPython: they're supported for function arguments and not supported for variables/class members. That's somewhat inconsistent and incomplete, what I point to.

Sorry if I read my sentence now I see it's not clear, possibly not even grammatically correct. What I meant with 'on that front' was something like 'there's a bunch of language features not implemented in MicroPython, and if I'd had to put up a ranking of unimplemented features by usefulness, I'm not sure how to rate annotations. Probably low, and definitely lower than assignment expressions'.

But yes, an incomplete implementation of annotations is far from ideal as it makes the feature much less usable.

@dpgeorge
Copy link
Member Author

But yes, an incomplete implementation of annotations is far from ideal as it makes the feature much less usable.

Although discussing this is a bit off-topic for the PR here, I just want to point out that function annotations (aka PEP 3107 https://www.python.org/dev/peps/pep-3107/) were added in Python 3.0 and were in MicroPython from the very first commit 429d719, while variable annotations (aka PEP 526 https://www.python.org/dev/peps/pep-0526/) are a Python 3.6 addition, so that's why they are not in MicroPython yet.

@stinos
Copy link
Contributor

stinos commented Jul 12, 2019

Ah, should have checked this more in detail. I've never used annotation with MicroPython but from Paul's comment it sounded like the original PEP 3107 implementation was incomplete.

@dpgeorge
Copy link
Member Author

dpgeorge commented Aug 5, 2019

While it may well be the case that these all work on the current CPython implementation, the spec specifically calls out these three examples as being invalid.
...
It might be worth raising this with some CPython devs.

I talked to Nick Coghlan about it and he agreed that the PEP is correct and CPython should be fixed. See https://bugs.python.org/issue37757 and python/cpython#15131

@ncoghlan
Copy link

ncoghlan commented Aug 5, 2019

I'll also note that we think it's important for CPython to rule out those cases to keep reference implementation quirks from leaking into the language specification implied by the test suite. For other implementations like MicroPython "These formally disallowed constructs have the following behaviour rather than throwing an exception" could be a reasonable option if actually enforcing the constraints complicates your compiler more than you would like.

@nickovs
Copy link
Contributor

nickovs commented Aug 5, 2019

@ncoghlan Thanks for that clarification.

I suspect that detecting and complaining about attempts to have the target of an assignment expression be the same as the target of an enclosing a for loop might take more code that the whole of the rest of the current MicroPython implementation. I'll leave it to the maintainers to decide but in this case I'd be inclined to save the bytes and include the a note about allowing this ambiguity in the 'differences' documentation.

@dpgeorge dpgeorge added the py-core Relates to py/ directory in source label Jun 8, 2020
@nickovs
Copy link
Contributor

nickovs commented Jun 10, 2020

@dpgeorge I see that added a py-core tag to this. Does this mean you might be merging this soon? The uptake from 3.7 to 3.8 seems to have happened faster than from 3.6 to 3.7 and at least anecdotally it seems that assignment expressions were one of the drivers for this.

@dpgeorge
Copy link
Member Author

I see that added a py-core tag to this. Does this mean you might be merging this soon?

The main reason to add that tag was just to categorise this PR, eg for easier searching.

Regarding merging this PR: if we are going to eventually have support for := (which I'd like to see) then I don't see any harm in adding it sooner rather than later, and considering the code is already written (here in this PR), we could just go ahead and add it now. If we waited to merge 3.8 features only after 3.7 was complete then likely we'd never get to 3.8 (or even 3.6 for that matter...).

@dpgeorge dpgeorge force-pushed the py-pep572-assign-expr branch 2 times, most recently from c3560e0 to 9706995 Compare June 11, 2020 01:32
@dpgeorge
Copy link
Member Author

I've rebased this on latest master and updated some of the tests. Some of the cases which CPython raises a SyntaxError for are currently allowed in uPy in this PR. Need to see if/how to deal with those cases (see assign_expr_syntaxerror.py).

@dpgeorge
Copy link
Member Author

Some of the cases which CPython raises a SyntaxError for are currently allowed in uPy in this PR. Need to see if/how to deal with those cases (see assign_expr_syntaxerror.py).

IMO it's not worth forbidding these in uPy, it costs a lot of code to do so. And the result of the expression makes sense so would unlikely lead to bugs even if it were used. Eg:

>>> [i := -1 for i in range(4)]
[-1, -1, -1, -1]

This would give a SyntaxError in CPython.

@nickovs
Copy link
Contributor

nickovs commented Jun 11, 2020

I agree that it's not worth adding code just to prevent this. The semantics of Python's for loops are such that while modifying the loop variable my make your code hard to read, it can't interfere with the progress of the loop (unlike in C).

@stinos
Copy link
Contributor

stinos commented Jun 11, 2020

This would give a SyntaxError in CPython.

Counter-intuitive for me actually, but as long as these differences are documented it doesn't matter I think.

it can't interfere with the progress of the loop (unlike in C).

While for typical use of range this is the case, there's no inherent reason why the loop variable wouldn't be able to interfere with iteration. E.g. this loops forever:

class Foo:
  def __init__(self):
    self.i = 0

  def __iter__(self):
    return self

  def __next__(self):
    if self.i == 2:
      raise StopIteration()
    self.i += 1
    return self

for i in Foo():
  i.i -= 1

@dpgeorge
Copy link
Member Author

A pushed a cpydiff test for this difference in behaviour.

@dpgeorge
Copy link
Member Author

I tried to optimise code size as best I could. The resulting code-size-diff for this PR is:

   bare-arm:  +128 +0.194% 
minimal x86:  +312 +0.213% 
   unix x64:  +328 +0.065% 
unix nanbox:  +328 +0.073% 
      stm32:  +120 +0.031% PYBV10
     cc3200:   +72 +0.039% 
    esp8266:  +180 +0.026% GENERIC
      esp32:  +168 +0.013% GENERIC[incl +16(data)]
        nrf:  +116 +0.081% pca10040
       samd:  +120 +0.118% ADAFRUIT_ITSYBITSY_M4_EXPRESS

That's a moderate increase.

So the remaining decision is whether to make this feature optional, ie controlled by something like MICROPY_PY_ASSIGN_EXPR. And if so what the default is and what ports have it enabled. Following MICROPY_PY_ASYNC_AWAIT the default could be enabled, but then disabled for very minimal ports.

comp->scope_cur = old_scope;
}

STATIC void compile_namedexpr(compiler_t *comp, mp_parse_node_struct_t *pns) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is used, it's part of the big table referenced by grammar.h

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, text editor search failure

@stinos
Copy link
Contributor

stinos commented Jun 11, 2020

Pretty amazing such a small amount of changes are required to make this work.

Since there's already so much configurable I'd say this one should be as well. If only to accommodate people who really dislike it :)
And then enabled by default on the non-minimal ports seems sane.

@nickovs
Copy link
Contributor

nickovs commented Jun 11, 2020

it can't interfere with the progress of the loop (unlike in C).

While for typical use of range this is the case, there's no inherent reason why the loop variable wouldn't be able to interfere with iteration. E.g. this loops forever:
...

That example is modifying what i points to, not modifying the value of the variable i itself. In general in Python if you have for i in <something>: and you modify the value of the variable i inside the loop, the for construct won't care.

@nickovs
Copy link
Contributor

nickovs commented Jun 11, 2020

So the remaining decision is whether to make this feature optional, ie controlled by something like MICROPY_PY_ASSIGN_EXPR. And if so what the default is and what ports have it enabled. Following MICROPY_PY_ASYNC_AWAIT the default could be enabled, but then disabled for very minimal ports.

I support having it on by default and disabled only where size is critical.

That said, I do feel that there are going to be scenarios where idiomatic code written using assignment expressions will result in smaller bytecode than without (particularly things like [y for x in z if y := foo(x)] where an extra function call is avoided). If I find enough cases in the standard libraries where assignment expressions reduce the bytecode size I might come back and argue for it being on all the time.

@dpgeorge
Copy link
Member Author

That said, I do feel that there are going to be scenarios where idiomatic code written using assignment expressions will result in smaller bytecode than without

Yes... so one could look at total code-size change including changes to all (frozen) Python code that use this new syntax and potentially give a net overall decrease. Will be interesting to see such results.

@stinos
Copy link
Contributor

stinos commented Jun 11, 2020

That example is modifying what i points to, not modifying the value of the variable i itself.

Aha, I misundersood; typical Python name vs object lingo problem. I interpreted 'modifying the variable' as 'modifying the object' but you meant 'assigning/binding the name to another object' - or whatever the correct Python lingo is ('points to' is pretty clear, but that's probably not official, same with 'variable')

@dpgeorge
Copy link
Member Author

Ok, I pushed a few commits to make this feature optional, via MICROPY_PY_ASSIGN_EXPR. It's enabled by default, disabled on bare-arm and minimal ports.

@dpgeorge
Copy link
Member Author

The only remaining Travis CI failure now is due to a small increase (+16 bytes) in the bare-arm and minimal builds, because the new feature is not fully disabled when MICROPY_PY_ASSIGN_EXPR=0: there is a minor remnant in the grammar and a new lexer token := which would be messy to make fully configurable.

Apart from squashing most of the commits, this PR is ready to merge.

@dpgeorge dpgeorge force-pushed the py-pep572-assign-expr branch from 0662b20 to 9a146c9 Compare June 16, 2020 11:43
dpgeorge added 4 commits June 16, 2020 22:02
The syntax matches CPython and the semantics are equivalent except that,
unlike CPython, MicroPython allows using := to assign to comprehension
iteration variables, because disallowing this would take a lot of code to
check for it.

The new compile-time option MICROPY_PY_ASSIGN_EXPR selects this feature and
is enabled by default, following MICROPY_PY_ASYNC_AWAIT.
@dpgeorge dpgeorge force-pushed the py-pep572-assign-expr branch from 9a146c9 to a3c89cf Compare June 16, 2020 12:07
@dpgeorge dpgeorge merged commit a3c89cf into micropython:master Jun 16, 2020
@dpgeorge dpgeorge deleted the py-pep572-assign-expr branch June 16, 2020 12:40
@dpgeorge
Copy link
Member Author

Merged!

tannewt pushed a commit to tannewt/circuitpython that referenced this pull request Jun 25, 2021
tannewt added a commit to tannewt/circuitpython that referenced this pull request Jun 25, 2021
@dlech dlech mentioned this pull request Oct 14, 2021
35 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
py-core Relates to py/ directory in source
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants