Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

DecimalTurn
Copy link
Contributor

@DecimalTurn DecimalTurn commented Feb 17, 2025

Description

Removing lookaheads in those regexes will help with portability across librairies using Re2-style regexes and should help ensure that our regexes run linearly.

Note that to demonstrate that this PR doesn't break ActionScript's heuristics, #7241
should be merged before changing this PR from a draft to a real PR.

@lildude lildude mentioned this pull request Feb 19, 2025
6 tasks
@DecimalTurn
Copy link
Contributor Author

Brief explanation of the logic behind the changes:

ActionScript:
The lookahead was there to check if at least one of the 2 keywords (intrinsic or extends) was present. By splitting the expression into 2 expressions seperated by |, we avoid the need for this check.

BrighterScript:
Lookaheads where pretty much unneeded here.

FreeMarker:
The lookahead checked that the next character was either \s or ], but then it checks that the next character is anything but ], so the only thing that could have matched was a \s anyways.

RUNOFF:
Because the lookahead is right before a |, we can easily replace it by matching the content directly.

Solidity:
We can avoid the negative lookahead by using a character class for the next character that excludes numbers.

Hosts File:
Because the lookahead is at the end, we can easily replace it by matching the content directly.

@DecimalTurn DecimalTurn marked this pull request as ready for review February 24, 2025 04:26
@DecimalTurn DecimalTurn requested a review from a team as a code owner February 24, 2025 04:26
@DecimalTurn
Copy link
Contributor Author

@Alhadis, I think I'll let you handle the Vim Help File case, I'm not ready to deal with with nested lookaheads at the moment 😅.

@Alhadis
Copy link
Collaborator

Alhadis commented Feb 24, 2025

@DecimalTurn The heuristic used by Vim Help Text was lifted from Linguist's Modeline strategy, which presents a much more readable and self-documenting version of the regex:

# NOTE: When changing this regex, be sure to keep the Vim Help heuristic updated too (#5347)
VIM_MODELINE = %r[
(?-m)
# Start of modeline (syntax documented in E520)
(?:
# `vi:`, `vim:` or `Vim:`
(?:^|[ \t]) (?:vi|Vi(?=m))
# Check if specific Vim version(s) are requested (won't work in vi/ex)
(?:
# Versioned modeline. `vim<700:` targets Vim versions older than 7.0
m
[<=>]? # If comparison operator is omitted, *only* this version is targeted
[0-9]+ # Version argument = (MINOR_VERSION_NUMBER * 100) + MINOR_VERSION_NUMBER
|
# Unversioned modeline. `vim:` targets any version of Vim.
m
)?
|
# `ex:`, which requires leading whitespace to avoid matching stuff like "lex:"
[ \t] ex
)
# If the option-list begins with `set ` or `se `, it indicates an alternative
# modeline syntax partly-compatible with older versions of Vi. Here, the colon
# serves as a terminator for an option sequence, delimited by whitespace.
(?=
# So we have to ensure the modeline ends with a colon
: (?=[ \t]* set? [ \t] [^\r\n:]+ :) |
# Otherwise, it isn't valid syntax and should be ignored
: (?![ \t]* set? [ \t])
)
# Possible (unrelated) `option=value` pairs to skip past
(?:
# Option separator, either
(?:
# 1. A colon (possibly surrounded by whitespace)
[ \t]* : [ \t]* # vim: noai : ft=sh:noexpandtab
|
# 2. At least one (horizontal) whitespace character
[ \t] # vim: noai ft=sh noexpandtab
)
# Option's name. All recognised Vim options have an alphanumeric form.
\w*
# Possible value. Not every option takes an argument.
(?:
# Whitespace between name and value is allowed: `vim: ft =sh`
[ \t]*=
# Option's value. Might be blank; `vim: ft= ` means "use no filetype".
(?:
[^\\\s] # Beware of escaped characters: titlestring=\ ft=sh
| # will be read by Vim as { titlestring: " ft=sh" }.
\\.
)*
)?
)*
# The actual filetype declaration
[ \t:] (?:filetype|ft|syntax) [ \t]*=
# Language's name
(\w+)
# Ensure it's followed by a legal separator (including EOL)
(?=$|\s|:)
]x

Any changes to Linguist::Strategy::Modeline::VIM_MODELINE will logically need to be reflected in the heuristic, and I strongly advise against making the regex even less readable by replacing nested lookaheads with… whatever works with re2.

In other words, you'll probably have to make an exception upstream for this particular heuristic. 🤷‍♂️

@DecimalTurn
Copy link
Contributor Author

DecimalTurn commented Feb 24, 2025

Ok, I'm going to take your advice on this one and leave it be for now.

@lildude lildude added this pull request to the merge queue Mar 5, 2025
Merged via the queue into github-linguist:main with commit f0830ab Mar 5, 2025
5 checks passed
@DecimalTurn DecimalTurn deleted the lookahead branch March 5, 2025 14:04
@github-linguist github-linguist locked as resolved and limited conversation to collaborators Jul 2, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants