Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

AverageHelper
Copy link
Contributor

@AverageHelper AverageHelper commented Jul 9, 2025

Closes #560

Version 0.24.1 of the "Gemini hypertext format" is specified at https://geminiprotocol.net/docs/gemtext-specification.gmi.

This is my first time making a Lexer here 😅 I first built it in XML format, but couldn't work out how to get it to also handle syntax inside of preformatted blocks, so I moved it to Go using markdown.go as a reference.

my original gemtext.xml, which tests the same except preformatted blocks are always plain text
<lexer>
  <config>
    <name>Gemtext</name>
    <alias>gemtext</alias>
    <alias>gmi</alias>
    <alias>gmni</alias>
    <alias>gemini</alias>
    <filename>*.gmi</filename>
    <filename>*.gmni</filename>
    <filename>*.gemini</filename>
    <mime_type>text/gemini</mime_type>
  </config>
  <rules>
    <state name="root">
      <rule pattern="^(#[^#].+\r?\n)">
        <token type="GenericHeading" />
      </rule>
      <rule pattern="^(#{2,3}.+\r?\n)">
        <token type="GenericSubheading" />
      </rule>
      <rule pattern="^(\* )(.+\r?\n)">
        <bygroups>
          <token type="Keyword" />
          <token type="Text" />
        </bygroups>
      </rule>
      <rule pattern="^(>)(.+\r?\n)">
        <bygroups>
          <token type="Keyword" />
          <token type="GenericEmph" />
        </bygroups>
      </rule>
      <rule pattern="^(```\r?\n)([\w\W]*?)(^```)(.+\r?\n)?">
        <bygroups>
          <token type="LiteralString" />
          <token type="Text" />
          <token type="LiteralString" />
          <token type="Comment" />
        </bygroups>
      </rule>
      <rule pattern="^(```)(.+\r?\n)([\w\W]*?)(^```)(.+\r?\n)?">
        <bygroups>
          <token type="LiteralString" />
          <token type="LiteralString" />
          <token type="Text" />
          <token type="LiteralString" />
          <token type="Comment" />
        </bygroups>
      </rule>
      <rule pattern="^(=>)(\s*)([^\s]+)(\s*)$">
        <bygroups>
          <token type="Keyword" />
          <token type="Text" />
          <token type="NameAttribute" />
          <token type="Text" />
        </bygroups>
      </rule>
      <rule pattern="^(=>)(\s*)([^\s]+)(\s+)(.+)$">
        <bygroups>
          <token type="Keyword" />
          <token type="Text" />
          <token type="NameAttribute" />
          <token type="Text" />
          <token type="NameTag" />
        </bygroups>
      </rule>
      <rule pattern=".|(?:\r?\n)">
        <token type="Text" />
      </rule>
    </state>
  </rules>
</lexer>

The spec mentions that CRLF and LF are essentially interchangeable. I wasn't sure how to test that since IIRC Git clobbers CR on Unix systems, but this regex seems right at least..

@AverageHelper AverageHelper marked this pull request as ready for review July 9, 2025 23:28
@alecthomas alecthomas merged commit f3be4c6 into alecthomas:master Jul 10, 2025
2 checks passed
@AverageHelper AverageHelper deleted the avg/gemtext branch July 10, 2025 04:42
@alecthomas
Copy link
Owner

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Gemtext/Gemini lexer

2 participants