-
Notifications
You must be signed in to change notification settings - Fork 120
Allow macros in char classes #654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This is a fairly big PR, but in total it removes code, which I'm happy about, and it didn't require any changes to the test suite (at least not after the recent commit that ensured a consistent class order). |
|
This is still somewhat WIP. It should work now, but I intend to add at least one test case for macro expansion, and to make Codacy a bit happier. |
|
This pull request introduces 3 alerts when merging a9afe35 into 7ff3188 - view on LGTM.com new alerts:
|
|
This pull request fixes 1 alert when merging 2036c74 into e343125 - view on LGTM.com fixed alerts:
|
|
This pull request fixes 1 alert when merging 2e20f67 into e343125 - view on LGTM.com fixed alerts:
|
|
This pull request fixes 1 alert when merging 9147781 into e343125 - view on LGTM.com fixed alerts:
|
|
This pull request fixes 1 alert when merging 31fbf24 into e343125 - view on LGTM.com fixed alerts:
|
|
Thanks for the additional cleanup! How strongly do you feel about using |
If you talk about java8 streams, I don't feel strong at all. |
I think I'll leave it in after all, it's not so bad. |
|
Will rebase manually to clean up the commits a bit (this is a not unlikely point for future bisects), and merge after. |
Addresses issue #216.
fixes #216
This PR refactors and redesigns how parsing of char classes works, and by that removes some duplication in the .cup file. It also moves the definition of char classes/regular expressions like
empty,anyChar,newLineout of the .cup file intoRegExpandIntCharSetrespectively, to ensure consistency.Compound character classes (= classes that are not a primitive predefined class or unicode property) are now first fully parsed, then in a separate pass after macro expansion, normalised into one
IntCharSetthat fully describes the class. Only after that step, we partition the input character set into these classes. Apart from being much cleaner, this should in theory lead to a slightly better (=coarser) partition of the character set, because we don't make a partition for each operand in a compound expression separately. In practice, the effect of this is probably minimal.