Inefficient keyword matching

The tokenizer is quite inefficient when the stream is at the start of a tag key (or possibly the end of the tag). The state of the tokenizer is what it is in at the start of the third line in the following file:
"""
roff-asc
tag a
array int b 1 1
endtag
"""

The list of accepted keywords in this state is 
  
  * `endtag`
  * `char`
  * `byte`
  * `bool`
  * `int`
  * `float`
  * `double`
  * `array`

If `endtag` is read we should yield from the `tokenize_tag` generator, if `array` is read we should yield from the `tokenize_array_tagkey` generator and for the others we should yield from the `tokenize_simple_tagkey` generator.

The current tokenizer is woefully inefficient here as it will invoke a seek for each failed match.

Since these keywords are unique on the two first letters we could create a much more efficient tokenizer in this state:
```python
kw_lookup = {kw[0:2]: kw for kw in ["endtag", "char",  "byte", "bool", "int", "float", "double", "array"]}
def tokenize_endtag_or_tagkey(stream):
    kw_candidate = kw_lookup[stream.read(2)]
    if kw_candiate[2:] = stream.read(len(kw_candidate) - 2):
        kw =  TokenKind.keyword[kw_candidate]
        yield (Token(kw))
        if kw == TokenKind.ENDTAG:
           return
        elif kw == TokenKind.ARRAY:
           yield from tokenize_array_tagkey(stream)
        else
           yield from tokenize_simple_tagkey(stream)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inefficient keyword matching #11

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inefficient keyword matching #11

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions