Thanks to visit codestin.com
Credit goes to github.com

Skip to content
This repository was archived by the owner on Jun 3, 2025. It is now read-only.

[TransformersPipeline] Add in and refactor TransformersPipeline args#1218

Merged
bfineran merged 6 commits into
mainfrom
new_args
Sep 6, 2023
Merged

[TransformersPipeline] Add in and refactor TransformersPipeline args#1218
bfineran merged 6 commits into
mainfrom
new_args

Conversation

@dsikka

@dsikka dsikka commented Aug 29, 2023

Copy link
Copy Markdown
Contributor

For this ticket: https://app.asana.com/0/1201735099598270/1205276886236966/f

Summary:

  • Updates the TransformersPipeline constructor to add in two arguments: config and tokenizer
  • For both of these, the user can provide a string, path or transformers object which will be used as opposed to relying on a deployment directory with the expected json files. By default, these will both be None and in that case, the normal deployment directory workflow will be used.
  • Additionally, the config argument may also be a dictionary
  • To support this functionality, the get_onnx_path_and_configs is refactored/separated into two separate functions, get_hugging_face_configs and get_onnx_path

Testing:

  • Tested locally using a variety of combinations for config and tokenizer

Example:

from deepsparse import Pipeline
from transformers import LlamaConfig, LlamaTokenizerFast

tokenizer = LlamaTokenizerFast.from_pretrained("hf-internal-testing/llama-tokenizer")
config = {
   "_name_or_path": None,
   "architectures": [
      "LlamaForCausalLM"
   ],
   "bos_token_id": 1,
   "eos_token_id": 2,
   "hidden_act": "silu",
   "hidden_size": 5120,
   "initializer_range": 0.02,
   "intermediate_size": 13824,
   "max_position_embeddings": 4096,
   "model_type": "llama",
   "num_attention_heads": 40,
   "num_hidden_layers": 40,
   "num_key_value_heads": 40,
   "pretraining_tp": 1,
   "rms_norm_eps": 1e-05,
   "rope_scaling": None,
   "tie_word_embeddings": "false",
   "torch_dtype": "float16",
   "transformers_version": "4.31.0.dev0",
   "use_cache": "true",
   "vocab_size": 32000
}

llama = Pipeline.create(
   task="text-generation",
   model_path="/home/dsikka/models_llama/deployment_13",
   engine_type="onnxruntime",
   deterministic=False,
   config=config,
   tokenizer=tokenizer
)

inference = llama(sequences=["Hello?"])
for s in inference.sequences:
   print(s)

@dsikka dsikka marked this pull request as ready for review August 29, 2023 22:56
Comment thread src/deepsparse/transformers/helpers.py
Comment thread src/deepsparse/transformers/pipelines/pipeline.py
@bfineran bfineran merged commit 0f0029a into main Sep 6, 2023
@bfineran bfineran deleted the new_args branch September 6, 2023 19:10
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants