Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Restructure Tokenizer and Splitter modules#3002

Merged
alanakbik merged 2 commits intomasterfrom
import-tokenizer
Nov 27, 2022
Merged

Restructure Tokenizer and Splitter modules#3002
alanakbik merged 2 commits intomasterfrom
import-tokenizer

Conversation

@alanakbik
Copy link
Collaborator

When initializing a Sentence object, with use_tokenizer=True (set by default), it triggered an import of the SegtokTokenizer in the Sentence constructor. According to this thread, there is a cost of re-importing modules and since we often create large numbers of Sentence objects, this is unnecessary overhead.

This PR makes the import global. To do this, it splits up the flair.tokenization module into tokenization (for all tokenizers) and splitter (for all sentence splitters). This way, there is no longer an import of flair.data in flair.tokenization.

@alanakbik
Copy link
Collaborator Author

However, it doesn't really seem to make a difference in runtime. Tested with

import timeit
t = timeit.Timer(setup='from flair.data import Sentence',
                 stmt='Sentence("a a a a a a a a a a ")')
print(t.timeit())

@alanakbik alanakbik merged commit 247bff4 into master Nov 27, 2022
@alanakbik alanakbik deleted the import-tokenizer branch November 27, 2022 12:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments