Thanks to visit codestin.com
Credit goes to github.com

Skip to content

match_previous_expr does not handle nested expressions #560

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ptmcg opened this issue Jun 5, 2024 · 7 comments
Open

match_previous_expr does not handle nested expressions #560

ptmcg opened this issue Jun 5, 2024 · 7 comments

Comments

@ptmcg
Copy link
Member

ptmcg commented Jun 5, 2024

See this SO post, should be able to use match_previous_expr to enforce matching of opening and closing tags. Does not work for nested tags.

@Kaltxi
Copy link

Kaltxi commented Jun 5, 2024

I've tried some simpler expression grammars and even this doesn't work for me:

import pyparsing as pp

LBRACE, RBRACE = map(pp.Suppress, "()")
NUMBER = pp.Word(pp.nums, pp.nums + ".")
MATH_TEXT = pp.Word("+-*/ ")

expression = pp.Forward()
GROUP = LBRACE + expression + RBRACE
expression = pp.OneOrMore(GROUP | NUMBER | MATH_TEXT)

print(expression.parse_string("5*(2+3)", parse_all=True).dump(" ")

This fails with an error:

Traceback (most recent call last):
  File "D:\pp-test\evaluation.py", line 119, in <module>
    print(expression.parse_string("5*(2+3)", parse_all=True).dump(" "))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\user\AppData\Roaming\Python\Python312\site-packages\pyparsing\core.py", line 1200, in parse_string
    raise exc.with_traceback(None)
pyparsing.exceptions.ParseException: Expected end of text, found '('  (at char 2), (line:1, col:3)

@ptmcg
Copy link
Member Author

ptmcg commented Jun 5, 2024

expression = pp.OneOrMore(GROUP | NUMBER | MATH_TEXT) should be expression <<= pp.OneOrMore(GROUP | NUMBER | MATH_TEXT)

@Kaltxi
Copy link

Kaltxi commented Jun 5, 2024

@ptmcg Right, thanks for pointing out! Basic nesting does work alright then.

@ptmcg
Copy link
Member Author

ptmcg commented Jun 5, 2024

Pyparsing supports some warnings to help in catching stuff like that. You have to enable them with -W switch or in code - see notes here: https://github.com/pyparsing/pyparsing/wiki/Parser-Debugging-and-Diagnostics#diagnostic-switches

@ptmcg
Copy link
Member Author

ptmcg commented Jun 6, 2024

Pyparsing has a helper method called match_previous_expr() that is intended for doing this kind of task, but it doesn't handle nested expression matching. That is, it could correctly validate {tag1}some content{/tag1}, but not {tag1}some content{tag2}inner content{/tag2}{/tag1}. I'll post a nested version here for you to try out - if it works for you, I'll include it in the next release.

@ptmcg
Copy link
Member Author

ptmcg commented Jun 6, 2024

Here is your parser with a nested-capable match_previous_expr() method. I also converted your test code to use run_tests, which makes it easier to try multiple test strings.

import pyparsing as pp


def match_previous_expr_nested(expr: pp.ParserElement) -> pp.ParserElement:
    rep = pp.Forward()
    e2 = expr.copy()
    rep <<= e2
    rep.match_stack = []

    def copy_token_to_repeater(s, l, t):
        rep.match_stack.append(pp.helpers._flatten(t.as_list()))

        def must_match_these_tokens(s, l, t):
            these_tokens = pp.helpers._flatten(t.as_list())
            match_tokens = rep.match_stack[-1]
            if these_tokens != match_tokens:
                if these_tokens in rep.match_stack:
                    rep.match_stack.pop()
                if len(match_tokens) == 1:
                    error_msg = f"Expected {str(match_tokens[0])!r}"
                else:
                    error_msg = f"Expected {match_tokens}"
                raise pp.ParseException(s, l, error_msg)
            rep.match_stack.pop()

        rep.set_parse_action(must_match_these_tokens, callDuringTry=True)

    expr.add_parse_action(copy_token_to_repeater, callDuringTry=True)
    rep.set_name("(prev) " + str(expr))
    return rep


pp.ParserElement.set_default_whitespace_chars("")
LBRACE, RBRACE, SLASH = map(pp.Suppress, "{}/")
IDENTIFIER_CHARS = pp.alphanums + "_"

TAG_NAME = pp.Word(IDENTIFIER_CHARS)

# Tags
OPEN_TAG = LBRACE + (OPEN_TAG_NAME := TAG_NAME("open_tag")) + RBRACE
# CLOSE_TAG = LBRACE + SLASH + TAG_NAME("close_tag") + RBRACE
CLOSE_TAG = LBRACE + SLASH + match_previous_expr_nested(OPEN_TAG_NAME) + RBRACE

# Forward declaring content due to its recursivity
content = pp.Forward()

# Main elements
TAGGED_CONTENT = pp.Group(OPEN_TAG + pp.Group(content) + CLOSE_TAG)("tagged_content*")
PLAIN_TEXT = pp.Group(pp.CharsNotIn("{}"))("plain_text*")
STANDALONE_TAG = pp.Group(LBRACE + TAG_NAME + RBRACE)("standalone_tag*")

# Recursive definition of content
content <<= pp.ZeroOrMore(TAGGED_CONTENT | STANDALONE_TAG | PLAIN_TEXT)

# pp.autoname_elements()
# OPEN_TAG.set_debug()
# CLOSE_TAG.set_debug()
# TAGGED_CONTENT.set_debug()
# PLAIN_TEXT.set_debug()
# STANDALONE_TAG.set_debug()
# content.create_diagram(f"{__file__.removeprefix('.py')}.html")

content.run_tests(
    """\
    {tag}Tagged content{/tag} plain text {standalone}{tag}Tagged again!{/tag}
    {tag}Tagged content{/tag} plain text {standalone}{tag}Tagged again!{/tag}{/standalone}
    {t1}aaa{t2}bbb{/t2}{/t1}
    {t1}aaa{t2}bbb{/t1}    
    {t1}aaa{t2}{/t2}bbb{/t1}    
    {t1}{t2}{/t1}
    
    # error case - mismatched closing tag
    {t1}{t2}{/not_t1}
    """,
    full_dump=False,
    )

@Kaltxi
Copy link

Kaltxi commented Jun 18, 2024

@ptmcg Hello! Sorry for the long delay, was working on other tasks. By the way, thanks for your work and attention to the project, pyparsing rocks, prototyping speed is very nice indeed :)

Now to the matter: yes, this looks nice and seems to work as intended. I even mananged to make a shorthand tag notation work (e.g. {tag}Tagged{/t}, but not {tag}Tagged{/d}):

LBRACE, RBRACE, SLASH = map(pp.Suppress, "{}/")
IDENTIFIER_CHARS = pp.alphanums + "_"

TAG_NAME_START = pp.Char(IDENTIFIER_CHARS)
TAG_NAME = pp.Word(IDENTIFIER_CHARS)

# Tags
OPEN_TAG = (
    LBRACE
    + pp.Group(pp.original_text_for((OPEN_TAG_NAME_START := TAG_NAME_START) + pp.Opt(TAG_NAME)))(
        "open_tag"
    )
    + RBRACE
)
CLOSE_TAG = (
    LBRACE
    + SLASH
    + pp.Group(
        pp.original_text_for(match_previous_expr_nested(OPEN_TAG_NAME_START) + pp.Opt(TAG_NAME))
    )("close_tag")
    + RBRACE
)

# Forward declaring content due to its recursivity
content = pp.Forward()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants