Thanks to visit codestin.com
Credit goes to github.com

Skip to content
This repository was archived by the owner on Jun 3, 2025. It is now read-only.

use max sequence length for tokenization#1166

Merged
mgoin merged 1 commit into
mainfrom
max-bucket-tokenization
Aug 9, 2023
Merged

use max sequence length for tokenization#1166
mgoin merged 1 commit into
mainfrom
max-bucket-tokenization

Conversation

@horheynm

@horheynm horheynm commented Aug 4, 2023

Copy link
Copy Markdown

Fix bug for https://neuralmagic.slack.com/archives/C0592FX3215/p1687210067995879

Example:

from deepsparse import Pipeline


path = "path"

pipeline = Pipeline.create(
    task = "text-classification",
    model_path = path,
    batch_size=8,
    num_cores=None,
    sequence_length = [2, 128],
)

text = "We are flying from Texas to California"
pipeline(text)

Before:

(.venv) ubuntu@quad-mle-2:~/george/nm/deepsparse$ python3 scratch/_p.py 
...
2023-08-04 01:19:39 __main__     INFO     Overwriting in-place the input shapes of the transformer model at /home/ubuntu/.cache/sparsezoo/bert-large-squad_wikipedia_bookcorpus-pruned80.4block_quantized/bert-large-squad_wikipedia_bookcorpus-pruned80.4block_quantized/model.onnx
Token indices sequence length is longer than the specified maximum sequence length for this model (9 > 2). Running this sequence through the model will result in indexing errors

After"

(.venv) ubuntu@quad-mle-2:~/george/nm/deepsparse$ python3 scratch/_p.py 
...
2023-08-04 01:25:44 __main__     INFO     Overwriting in-place the input shapes of the transformer model at /home/ubuntu/.cache/sparsezoo/bert-large-squad_wikipedia_bookcorpus-pruned80.4block_quantized/bert-large-squad_wikipedia_bookcorpus-pruned80.4block_quantized/model.onnx

@horheynm horheynm marked this pull request as ready for review August 4, 2023 02:21
mgoin
mgoin previously requested changes Aug 4, 2023
Comment thread src/deepsparse/transformers/pipelines/pipeline.py Outdated
@horheynm horheynm force-pushed the max-bucket-tokenization branch from 7823c07 to 7a7df94 Compare August 8, 2023 15:57

@mgoin mgoin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@mgoin mgoin merged commit cf9864b into main Aug 9, 2023
@mgoin mgoin deleted the max-bucket-tokenization branch August 9, 2023 13:42
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants