Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

trishume
Copy link
Owner

@trishume trishume commented Apr 4, 2017

So I tried updating to the latest version of the Sublime packages and unfortunately it seems the new Markdown syntax exhibits catastrophic backtracking on the regex I pasted below. Normally I would try and add some atomic capture groups, but I don't ever trust myself to understand this regex enough to do that without breaking it. It takes forever (I haven't bothered waiting more than a few minutes) to highlight even short lines.

I'm not sure how to handle the tradeoff between losing Markdown support completely but gaining all the fixes @keith-hall has made since I last updated.

I'm not sure I'll have the time to completely port to fancy-regex in the near-future to fix catastrophic backtracking, so I'm not sure what to do. Thoughts?

One option is that I fork the Sublime packages repo, revert markdown in my fork, and then base syntect off of that for now.

The Regex that does all the work in the new Markdown syntax

(?x)^
(?=  (?:[ ]{,3}>(?:.|$))
|    (?:[ ]{4}|\t)(?!$)
|    (?:[#]{1,6}\s*)
|    (?x:
    [ ]{,3}
    (?:
            [-](?:[ ]{,2}[-]){2,}
        |   [*](?:[ ]{,2}[*]){2,}
        |   [_](?:[ ]{,2}[_]){2,}
    )
    [ \t]*$
)
|    (?x:
    (?:(?x:
  (?:
    (?x:
  (?:
      \\[-`*_#+.!(){}\[\]\\>|]+                  # escape characters
  |   [^\[\]`\\]+                  # anything that isn't a square bracket or a backtick or the start of an escape character
  |   (?x:
    (`{4})(?=\S)[^`]+(?:[^`]+|(?!`{4})`*)*(`{4})(?!`)
|   (`{3})(?=\S)[^`]+(?:[^`]+|(?!`{3})`*)*(`{3})(?!`)
|   (`{2})(?=\S)[^`]+(?:[^`]+|(?!`{2})`*)*(`{2})(?!`)
|   (`{1})(?=\S)[^`]+(?:[^`]+|(?!`{1})`*)*(`{1})(?!`)
)                # inline code
  )
)
  | \[(?:                       # nested square brackets (one level deep)
        [^\[\]`]+               #  anything that isn't a square bracket or a backtick
        (?x:
    (`{4})(?=\S)[^`]+(?:[^`]+|(?!`{4})`*)*(`{4})(?!`)
|   (`{3})(?=\S)[^`]+(?:[^`]+|(?!`{3})`*)*(`{3})(?!`)
|   (`{2})(?=\S)[^`]+(?:[^`]+|(?!`{2})`*)*(`{2})(?!`)
|   (`{1})(?=\S)[^`]+(?:[^`]+|(?!`{1})`*)*(`{1})(?!`)
)?          #  balanced backticks
      )*\]                      #  closing square bracket
  )+                            # at least one character
)*\|){2}       # at least 2 non-escaped pipe chars on the line
|   (?!\s+\|)(?x:
  (?:
    (?x:
  (?:
      \\[-`*_#+.!(){}\[\]\\>|]+                  # escape characters
  |   [^\[\]`\\]+                  # anything that isn't a square bracket or a backtick or the start of an escape character
  |   (?x:
    (`{4})(?=\S)[^`]+(?:[^`]+|(?!`{4})`*)*(`{4})(?!`)
|   (`{3})(?=\S)[^`]+(?:[^`]+|(?!`{3})`*)*(`{3})(?!`)
|   (`{2})(?=\S)[^`]+(?:[^`]+|(?!`{2})`*)*(`{2})(?!`)
|   (`{1})(?=\S)[^`]+(?:[^`]+|(?!`{1})`*)*(`{1})(?!`)
)                # inline code
  )
)
  | \[(?:                       # nested square brackets (one level deep)
        [^\[\]`]+               #  anything that isn't a square bracket or a backtick
        (?x:
    (`{4})(?=\S)[^`]+(?:[^`]+|(?!`{4})`*)*(`{4})(?!`)
|   (`{3})(?=\S)[^`]+(?:[^`]+|(?!`{3})`*)*(`{3})(?!`)
|   (`{2})(?=\S)[^`]+(?:[^`]+|(?!`{2})`*)*(`{2})(?!`)
|   (`{1})(?=\S)[^`]+(?:[^`]+|(?!`{1})`*)*(`{1})(?!`)
)?          #  balanced backticks
      )*\]                      #  closing square bracket
  )+                            # at least one character
)+\|(?!\s+$)(?x:
  (?:
    (?x:
  (?:
      \\[-`*_#+.!(){}\[\]\\>|]+                  # escape characters
  |   [^\[\]`\\]+                  # anything that isn't a square bracket or a backtick or the start of an escape character
  |   (?x:
    (`{4})(?=\S)[^`]+(?:[^`]+|(?!`{4})`*)*(`{4})(?!`)
|   (`{3})(?=\S)[^`]+(?:[^`]+|(?!`{3})`*)*(`{3})(?!`)
|   (`{2})(?=\S)[^`]+(?:[^`]+|(?!`{2})`*)*(`{2})(?!`)
|   (`{1})(?=\S)[^`]+(?:[^`]+|(?!`{1})`*)*(`{1})(?!`)
)                # inline code
  )
)
  | \[(?:                       # nested square brackets (one level deep)
        [^\[\]`]+               #  anything that isn't a square bracket or a backtick
        (?x:
    (`{4})(?=\S)[^`]+(?:[^`]+|(?!`{4})`*)*(`{4})(?!`)
|   (`{3})(?=\S)[^`]+(?:[^`]+|(?!`{3})`*)*(`{3})(?!`)
|   (`{2})(?=\S)[^`]+(?:[^`]+|(?!`{2})`*)*(`{2})(?!`)
|   (`{1})(?=\S)[^`]+(?:[^`]+|(?!`{1})`*)*(`{1})(?!`)
)?          #  balanced backticks
      )*\]                      #  closing square bracket
  )+                            # at least one character
)+
)
)

cc @robinst

@keith-hall
Copy link
Collaborator

I'm taking a look at the Markdown syntax to reduce backtracking possibilities, without affecting performance in ST. sublimehq/Packages#877

@robinst
Copy link
Collaborator

robinst commented Apr 4, 2017

Wow, that is one hell of a regex.

One option is that I fork the Sublime packages repo, revert markdown in my fork, and then base syntect off of that for now.

That sounds like a good short-term solution. (FWIW, my use case doesn't use the Markdown syntax at all, but I'm interested in the fixes to Java and other code syntaxes.)

@keith-hall
Copy link
Collaborator

The tweaks from my aforementioned Markdown syntax PR seem to make it very much usable again :) I had to emulate possessive quantifiers so it wouldn't go backtracking-crazy ;)

trishume added 2 commits April 4, 2017 11:28
Switches to my fork of sublimehq/Packages to include two
pending pull requests that avoid catastrophic backtracking in Rust
and Markdown
@trishume trishume changed the title [WIP] Update Sublime packages Update Sublime packages Apr 4, 2017
@trishume
Copy link
Owner Author

trishume commented Apr 4, 2017

Thanks to @keith-hall I created a fork of Sublime's Packages, merged his PRs and then updated syntect to that fork. This should have all the latest things.

There's lots of syntax test failures, but I'm not sure how many of those are new and how many are fixed by the fix-test-bugs branch.

My current plan is to rebase fix-test-bugs on this branch, try to get fix-test-bugs ready for merging and eliminate as many failures as I can today. If I'm done by the end of the day I'll merge both and do a new release, otherwise I might just merge this one.

@trishume trishume merged commit 49f0b75 into master Apr 4, 2017
@trishume trishume deleted the update-packages branch April 4, 2017 18:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants