Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

lysnikolaou
Copy link
Member

@lysnikolaou lysnikolaou commented Mar 24, 2020

Closes #14.

@pablogsal
Copy link

pablogsal commented Mar 24, 2020

I am a bit uneasy removing the tests for the parser or the generator itself (not the ones for the ast of the grammar). If in the future we modify the generator once is merged in CPython, we certainly want the tests there, no?

@gvanrossum
Copy link

Agreed, I'd rather keep the tests, even if they are redundant.

@lysnikolaou
Copy link
Member Author

Those rely on a compiled extension though and for that we either need to have all the C code in Tools/peg_generator or we can refactor compile_extension to just rely on Parser/pegen/*. Would you be okay with that?

@pablogsal
Copy link

pablogsal commented Mar 24, 2020

See for example also the situation in #24: Without working tests against the python version in master is not possible to test if the change works.

Edit: I was not aware of Lib/test/test_peg_parser.py

@lysnikolaou
Copy link
Member Author

See for example also the situation in #24: Without working tests against the python version in master is not possible to test if the change works.

Well, grammar tests can always go into Lib/test/test_peg_parser.py. There's still the problem with the generator tests though, which I agree is important.

@pablogsal
Copy link

pablogsal commented Mar 24, 2020

There's still the problem with the generator tests though, which I agree is important.

We should make sure also that those tests run with the compiled Python as now they are running with Python3.8.

@lysnikolaou
Copy link
Member Author

lysnikolaou commented Mar 24, 2020

There's still the problem with the generator tests though, which I agree is important.

We should make sure also that those tests run with the compiled Python as now they are running with Python3.8.

I have been struggling with this for a bit and then I remembered that it's not possible due to we-like-parsers/pegen_experiments#85.

Any ideas on how to overcome this?

@pablogsal
Copy link

pablogsal commented Mar 24, 2020

Any ideas on how to overcome this?

We can include the private headers and link the compilation units for the functions that we use (like PyAST_mod2obj and others). We need to modify the extension compilation to also compile and link against the relevant .c files and include the headers in Include/internal/.

@gvanrossum
Copy link

When I first created the peg_parser module as a directory I solved this -- you have to add a few extra -I flags. Check the history in this repo of Modules/Setup.

@lysnikolaou lysnikolaou force-pushed the remove-peg-parser-tools branch from b5df192 to 374198f Compare March 25, 2020 00:22
@lysnikolaou
Copy link
Member Author

lysnikolaou commented Mar 25, 2020

I started from scratch on this and I deleted only the unecessary duplicates, but kept everything else in place.

Pegen Tests are failing, because Github Actions do not support 3.9 yet.

But there is a problem with the generated extension module. If you build it and call parse_string(..., mode=1) an ast.Module object is returned instead of an ast.AST object. No idea why.

In any case, I'm calling it a day. I'm going to further inspect this in the morning. Good night (evening in some timezones?) everybody!

@pablogsal
Copy link

pablogsal commented Mar 25, 2020

Pegen Tests are failing, because Github Actions do not support 3.9 yet.

You need to use the compiled CPython for this. This is similar to how we test the rest of the code in Tools like the argument clinic (Lib/test/test_clinic.py). We probably need to integrate those test in the normal test target.

In any case, I'm calling it a day. I'm going to further inspect this in the morning. Good night (evening in some timezones?) everybody!

Good night from London :)

@lysnikolaou
Copy link
Member Author

Were there any changes in the return value of PyAST_mod2obj in 3.9? For some reason I keep getting TypeError: expected AST, got 'Module' when calling ast.dump(..) with the return value of parse_string.

@gvanrossum
Copy link

gvanrossum commented Mar 25, 2020 via email

@lysnikolaou
Copy link
Member Author

Everything is now ready apart from the AST bug!

Since all of the tests are now in the stdlib, you can see the failures on Github Actions. Any pointer as to why this could be happening would really be helpful.

@pablogsal
Copy link

Everything is now ready apart from the AST bug!

Can you give an example of how it fail I just checked the PR and what I understood it should fail does not:

>>> ast.dump(peg_parser.parse_string("1+1"))
'Module(body=[Expr(value=BinOp(left=Constant(value=1), op=Add(), right=Constant(value=1)))], type_ignores=[])'

@lysnikolaou
Copy link
Member Author

Everything is now ready apart from the AST bug!

Can you give an example of how it fail I just checked the PR and what I understood it should fail does not:

>>> ast.dump(peg_parser.parse_string("1+1"))
'Module(body=[Expr(value=BinOp(left=Constant(value=1), op=Add(), right=Constant(value=1)))], type_ignores=[])'

The CPython extension module under Modules is ok. The bug is in the one in Tools/peg_generator.

@pablogsal
Copy link

pablogsal commented Mar 26, 2020

The CPython extension module under Modules is ok. The bug is in the one in Tools/peg_generator.

Oh, this is a tricky thing. The error is because there are two AST module classes that seem the same, so when you check for isinstance it fails:

>>> x = parse.parse_string("1+1", mode=1)
>>> y = ast.parse("1+1")
>>> id(x.__class__)
94743625738432
>>> id(y.__class__)
94743625122736
>>> x.__class__
<class 'ast.Module'>
>>> y.__class__
<class 'ast.Module'>

The reason seems is because we have a copy of the symbols in the extension module:

❯ nm parse.cpython-39d-x86_64-linux-gnu.so | grep astmodule
0000000000063940 d _astmodule

~/github/pegen-cpython/Tools/peg_generator/peg_parser remove-peg-parser-tools*
❯ nm ../../../python | grep astmodule
00000000003dda80 d _astmodule

Notice how both are "d", which means that they are in the data segment. I would need to investigate a bit more to confirm this theory, but this seems that is the reason. There is a reason my GitHub profile says "I hate symbols but I love linkers." 😛

@lysnikolaou
Copy link
Member Author

Wow! I could never have found this by myself.

The question now is, what do we do about this? If I understand correctly that means that we shouldn't compile Python/Python-ast.c with the extension. But we need to do that, because it all belongs to the internal C API, right? I must be missing something.

@pablogsal
Copy link

pablogsal commented Mar 26, 2020

I quick hack is doing this:

>>> x = parse.parse_string("1+1", mode=1)
>>> ast.parse(x)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/pablogsal/github/pegen-cpython/Lib/ast.py", line 50, in parse
    return compile(source, filename, mode, flags,
TypeError: compile() arg 1 must be a string, bytes or AST object
>>> y = ast.Module(x.body)
>>> ast.dump(y)
'Module(body=[<ast.Expr object at 0x7fe767850230>])'

We need to also add the other module properties, but unless there isn't a bigger downside I would be ok with such hack + a comment explaining the problem.

@lysnikolaou
Copy link
Member Author

I quick hack is doing this:

>>> x = parse.parse_string("1+1", mode=1)
>>> ast.parse(x)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/pablogsal/github/pegen-cpython/Lib/ast.py", line 50, in parse
    return compile(source, filename, mode, flags,
TypeError: compile() arg 1 must be a string, bytes or AST object
>>> y = ast.Module(x.body)
>>> ast.dump(y)
'Module(body=[<ast.Expr object at 0x7fe767850230>])'

We need to also add the other module properties, but unless there isn't a bigger downside I would be ok with such hack + a comment explaining the problem.

Well, this hack isn't enough though. We need to compare the full string produced by calling ast.dump(...), in order to be sure that everything is equal, right? We could implement a Visitor that checks for equality, but that seems like a lot of unnecessary work, in case there is some other way to fix this.

@pablogsal
Copy link

pablogsal commented Mar 26, 2020

Well, this hack isn't enough though. We need to compare the full string produced by calling ast.dump(...), in order to be sure that everything is equal, right? We could implement a Visitor that checks for equality, but that seems like a lot of unnecessary work, in case there is some other way to fix this.

Does it complain about other than the top Module? For instance the ast.Expr that inside does not seem to give any problem.

in case there is some other way to fix this.

We can copy-paste the ast.dump code to the test suite and modify it to avoid that check or allow the copy of the class. Basically rolling our own ast.dump reusing the original as much as possible.


In general the problem is that once the python interpreter is compiled, there is no way to get access to the PyAST_mod2obj function because that lives on the data segment of the interpreter or the libpython.so shared object. Linking against the original object file will create a copy because the function and the struct are static and not exported.

@lysnikolaou
Copy link
Member Author

Does it complain about other than the top Module? For instance the ast.Expr that inside does not seem to give any problem.

Nope, only for the top Module. But due to that complaint, ast.dump does not expand its child nodes.

@pablogsal
Copy link

ast.dump does not expand its child nodes.

Ohhhh, gotcha!

@lysnikolaou lysnikolaou requested a review from gvanrossum March 26, 2020 22:19
@lysnikolaou
Copy link
Member Author

Note that most of the approx. 1440 lines of code added in this PR are in Lib/test/test_peg_generator.py with is just a copy paste of test_c_parser.py, test_first_sets.py and test_pegen.py(converted to unittest, of course). So even though this seems huge, it's not as bad as it looks.

@pablogsal
Copy link

1440 lines of code added in this PR are in Lib/test/test_peg_generator.py with is just a copy paste of test_c_parser.py, test_first_sets.py and test_pegen.py(converted to unittest, of course).

Would it be possible to have them in a folder into separate files? I would prefer not to mix them into a single gigantic file. We have already some other tests in this format (check the folders in Lib/tests).

I know that's a serious hack, but I can't think of another way of overcoming this. Thoughts?

I would not mind much now that we have a comment given that we have more urgent things to care about. Maybe leave a todo for the future so we don't forget and open an issue if we all agree that we should do something better in the future.

@lysnikolaou
Copy link
Member Author

Would it be possible to have them in a folder into separate files? I would prefer not to mix them into a single gigantic file. We have already some other tests in this format (check the folders in Lib/tests).

Done!

Copy link

@pablogsal pablogsal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Great work and thanks for the perseverance :)

@lysnikolaou
Copy link
Member Author

Great work and thanks for the perseverance :)

Thank you for the help!

@gvanrossum
Copy link

The tests (make test in Tools/peg_generator/) still fail on my Mac with compilation errors. I don't understand what the problem is with the duplicate AST module (and since I can't compile it's hard to investigate).

@lysnikolaou
Copy link
Member Author

The tests (make test in Tools/peg_generator/) still fail on my Mac with compilation errors. I don't understand what the problem is with the duplicate AST module (and since I can't compile it's hard to investigate).

What exactly are the compilation errors about, when you run make build?

@gvanrossum
Copy link

Full output of make build inside Tools/peg_generator/:

python3.9 -m pegen -q -c data/python.gram -o peg_parser/parse.c --compile-extension
/Users/guido/cpygen/Parser/pegen/parse_string.c:290:26: error: no member named 'Slice' in 'union _expr::(anonymous at
      /usr/local/include/python3.9/Python-ast.h:243:5)'
            if (slice->v.Slice.lower) {
                ~~~~~~~~ ^
/Users/guido/cpygen/Parser/pegen/parse_string.c:291:45: error: no member named 'Slice' in 'union _expr::(anonymous at
      /usr/local/include/python3.9/Python-ast.h:243:5)'
                shift_expr(parent, slice->v.Slice.lower, lineno, col_offset);
                                   ~~~~~~~~ ^
/Users/guido/cpygen/Parser/pegen/parse_string.c:293:26: error: no member named 'Slice' in 'union _expr::(anonymous at
      /usr/local/include/python3.9/Python-ast.h:243:5)'
            if (slice->v.Slice.upper) {
                ~~~~~~~~ ^
/Users/guido/cpygen/Parser/pegen/parse_string.c:294:45: error: no member named 'Slice' in 'union _expr::(anonymous at
      /usr/local/include/python3.9/Python-ast.h:243:5)'
                shift_expr(parent, slice->v.Slice.upper, lineno, col_offset);
                                   ~~~~~~~~ ^
/Users/guido/cpygen/Parser/pegen/parse_string.c:296:26: error: no member named 'Slice' in 'union _expr::(anonymous at
      /usr/local/include/python3.9/Python-ast.h:243:5)'
            if (slice->v.Slice.step) {
                ~~~~~~~~ ^
/Users/guido/cpygen/Parser/pegen/parse_string.c:297:45: error: no member named 'Slice' in 'union _expr::(anonymous at
      /usr/local/include/python3.9/Python-ast.h:243:5)'
                shift_expr(parent, slice->v.Slice.step, lineno, col_offset);
                                   ~~~~~~~~ ^
/Users/guido/cpygen/Parser/pegen/parse_string.c:440:46: warning: incompatible pointer types passing 'slice_ty' (aka 'struct _slice *') to
      parameter of type 'expr_ty' (aka 'struct _expr *') [-Wincompatible-pointer-types]
            fstring_shift_slice_locations(n, n->v.Subscript.slice, lineno, col_offset);
                                             ^~~~~~~~~~~~~~~~~~~~
/Users/guido/cpygen/Parser/pegen/parse_string.c:287:67: note: passing argument to parameter 'slice' here
static void fstring_shift_slice_locations(expr_ty parent, expr_ty slice, int lineno, int col_offset) {
                                                                  ^
/Users/guido/cpygen/Parser/pegen/parse_string.c:441:27: warning: incompatible pointer types passing 'slice_ty' (aka 'struct _slice *') to
      parameter of type 'expr_ty' (aka 'struct _expr *') [-Wincompatible-pointer-types]
            shift_expr(n, n->v.Subscript.slice, lineno, col_offset);
                          ^~~~~~~~~~~~~~~~~~~~
/Users/guido/cpygen/Parser/pegen/parse_string.c:263:55: note: passing argument to parameter 'n' here
static inline void shift_expr(expr_ty parent, expr_ty n, int line, int col) {
                                                      ^
2 warnings and 6 errors generated.
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/distutils/unixccompiler.py", line 117, in _compile
    self.spawn(compiler_so + cc_args + [src, '-o', obj] +
  File "/usr/local/lib/python3.9/distutils/ccompiler.py", line 910, in spawn
    spawn(cmd, dry_run=self.dry_run)
  File "/usr/local/lib/python3.9/distutils/spawn.py", line 36, in spawn
    _spawn_posix(cmd, search_path, dry_run=dry_run)
  File "/usr/local/lib/python3.9/distutils/spawn.py", line 157, in _spawn_posix
    raise DistutilsExecError(
distutils.errors.DistutilsExecError: command 'gcc' failed with exit status 1

During handling of the above exception, another exception occurred:

distutils.errors.CompileError: command 'gcc' failed with exit status 1
For full traceback, use -v
make: *** [peg_parser/parse.c] Error 1

FWIW, there are also some warnings when running `./python.exe -m test test_peg_generator in the repo root:

0:00:00 load avg: 2.39 [1/1] test_peg_generator
/var/folders/f2/mzkc8c9x3h5_7vwkb86k770c0000gn/T/tmpmpy7easg/parse.c:30:26: warning: incompatible pointer types passing
      'expr_ty (Parser *)' (aka 'struct _expr *(Parser *)') to parameter of type 'void *(*)(Parser *)' [-Wincompatible-pointer-types]
            lookahead(1, name_token, p)
                         ^~~~~~~~~~
/Users/guido/cpygen/Parser/pegen/pegen.h:96:27: note: passing argument to parameter 'func' here
int lookahead(int, void *(func)(Parser *), Parser *);
                          ^
1 warning generated.
/var/folders/f2/mzkc8c9x3h5_7vwkb86k770c0000gn/T/tmpt8kkleh6/parse.c:30:26: warning: incompatible pointer types passing
      'expr_ty (Parser *)' (aka 'struct _expr *(Parser *)') to parameter of type 'void *(*)(Parser *)' [-Wincompatible-pointer-types]
            lookahead(0, name_token, p)
                         ^~~~~~~~~~
/Users/guido/cpygen/Parser/pegen/pegen.h:96:27: note: passing argument to parameter 'func' here
int lookahead(int, void *(func)(Parser *), Parser *);
                          ^
1 warning generated.

@pablogsal
Copy link

pablogsal commented Mar 26, 2020

I don't understand what the problem is with the duplicate AST module (and since I can't compile it's hard to investigate).

The problem is that both libpython.so (or the static binary) and the extensions that we generate outside the core (for the tests) want PyAST_mod2obj ftom Python/Python-ast.c. As this function is not exported (it has hidden visibility) we cannot obtain that from libpython.so or the interpreter directly and we need to link against Python/Python-ast.o in the extension module. As the AST types are static types (they live in the data segment), each time you are linking against Python/Python-ast.o you get a different copy of it. In this way we end with two copies of the ast types, and in particular, _ast.Module: one in libpython.so (or the static binary) and the other in the extension module that we produce.

These two copies are two different classes, so using isinstance fails when checking instances of the other class.

@gvanrossum
Copy link

This is our branch. Can't we just export that function? Or perhaps a wrapper with a name that makes it clear it's only for our own use?

@lysnikolaou
Copy link
Member Author

@gvanrossum Could it be that your version of python was built before python#9605 was merged?

@pablogsal
Copy link

Can't we just export that function? Or perhaps a wrapper with a name that makes it clear it's only for our own use?

We could export it with an underscore may be like the other "private" functions (like _PyWeakref_ClearRef)

@@ -1,4 +1,4 @@
PYTHON ?= python3.8
PYTHON ?= python3.9
Copy link

@pablogsal pablogsal Mar 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be ../../python? (Or python.exe in mac)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be ../../python indeed!

I'm now getting an ImportError: cannot import name 'parse' from 'peg_parser' (unknown location) though.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, it doesn't look like the ../../ solution works. :-(

Copy link

@pablogsal pablogsal Mar 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is because is finding first the C extrension that we include in Modules. We need to rename Tools/peg_generator/peg_parser to something else. (Also make sure you always use the compiled python).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to rename Tools/peg_generator/peg_parser to something else.

Done.

@pablogsal
Copy link

Can't we just export that function? Or perhaps a wrapper with a name that makes it clear it's only for our own use?

Now that I think about this more, for us to not link against Python-ast.o we may need to export all ast types as well or otherwise, we will get things like:

undefined symbol: _Py_argument

@gvanrossum
Copy link

But that's generated code isn't it?

@pablogsal
Copy link

pablogsal commented Mar 26, 2020

But that's generated code isn't it?

Yup, by Parser/asdl_c.py if I am not mistaken.

@gvanrossum
Copy link

So is it worse to export all those symbols or to duplicate all that code? Also, is this a permanent issue or is it temporary until we get to merge our branch?

@pablogsal
Copy link

pablogsal commented Mar 27, 2020

Notice that I may be missing a better solution here, but in case I am not this is my view:

So is it worse to export all those symbols or to duplicate all that code?

Given that exporting the symbols may have some other consequences and this is only for testing the generator itself (not to make the extension for CPython or the itegration) I would prefer to duplicate the ast.parse code to avoid modifying CPython.

Also, is this a permanent issue or is it temporary until we get to merge our branch?

This is a "permanent" issue, in the sense that this won't change on the integration. But to remark: this only happens for the tests and only happens if we insist in using the builtin ast.dump. My suggestion was just deferring a better solution for when we have finished everything else.

@gvanrossum
Copy link

@gvanrossum Could it be that your version of python was built before python#9605 was merged?

That was it -- I built and installed from the upstream master and things are now better.

So given the issues around sharing the AST definition maybe just land.

@lysnikolaou
Copy link
Member Author

I also added a TODO in the comment in ast_dump.

I also think, we can land this as is for now and open an issue about fixing the AST symbols thing.

Copy link

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go!

@@ -0,0 +1,60 @@
def ast_dump(node, annotate_fields=True, include_attributes=False, *, indent=None):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, if it's just this file that's duplicated, I don't mind. Sorry for making a fuss!

@lysnikolaou lysnikolaou merged commit 479bd95 into pegen Mar 27, 2020
@lysnikolaou lysnikolaou deleted the remove-peg-parser-tools branch March 27, 2020 00:46
@gvanrossum
Copy link

:party:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hand-written extension module in Tools/peg_generator
3 participants