Thanks to visit codestin.com
Credit goes to github.com

Skip to content
Discussion options

You must be logged in to vote

Maybe you can create multiple iterators per file, e.g. if you want 4 iterators per file you can have the first iterator that starts at the beginning of the file and stops at the first EOL at 1/4 of the file, then the second iterator starts after the first EOL at 1/4 of the file until the first EOL at 1/2 of the file, etc.

Btw instead of using a GeneratorBasedBuilder it's maybe easier to use Dataset.from_generator or IterableDataset.from_generator

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Answer selected by yzhangcs
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants