Codestin Search App

alanakbik · 2022-11-27T08:06:26Z

When initializing a Sentence object, with use_tokenizer=True (set by default), it triggered an import of the SegtokTokenizer in the Sentence constructor. According to this thread, there is a cost of re-importing modules and since we often create large numbers of Sentence objects, this is unnecessary overhead.

This PR makes the import global. To do this, it splits up the flair.tokenization module into tokenization (for all tokenizers) and splitter (for all sentence splitters). This way, there is no longer an import of flair.data in flair.tokenization.

alanakbik · 2022-11-27T08:17:06Z

However, it doesn't really seem to make a difference in runtime. Tested with

import timeit
t = timeit.Timer(setup='from flair.data import Sentence',
                 stmt='Sentence("a a a a a a a a a a ")')
print(t.timeit())

alanakbik added 2 commits November 27, 2022 08:54

Restructure Tokenizer and Splitter modules

b26d876

Fix imports in unit tests

d6a19bb

alanakbik merged commit 247bff4 into master Nov 27, 2022

alanakbik deleted the import-tokenizer branch November 27, 2022 12:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Restructure Tokenizer and Splitter modules#3002

Restructure Tokenizer and Splitter modules#3002
alanakbik merged 2 commits intomasterfrom
import-tokenizer

alanakbik commented Nov 27, 2022

Uh oh!

alanakbik commented Nov 27, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Uh oh!

Conversation

alanakbik commented Nov 27, 2022

Uh oh!

alanakbik commented Nov 27, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments