-
Notifications
You must be signed in to change notification settings - Fork 450
Make SDG batch size configurable via system profile #3157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cdoern
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good! Technically, if you are using the default value you don't need to add this to all of the profiles manually. One comment about the default in the click flag.
| default=None, | ||
| help="Number of elements to process in each batch through the SDG pipeline. Enabled by default for the vLLM serving backend, with a batch size of 8 chosen based on experiments to optimize for throughput. Use 0 to disable.", | ||
| cls=clickext.ConfigOption, | ||
| default=DEFAULTS.BATCH_SIZE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the default comes from the config when you set cls=... so no default is necessary here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cdoern, hmm I think I am missing something. If I run ilab data generate --debug-params. I get all the values of the parameter correct except for the batch size for which I get
batch_size: None [type: None, src: default] (if I remove the default)
576aa76 to
f4e8c50
Compare
| default=None, | ||
| help="Number of elements to process in each batch through the SDG pipeline. Enabled by default for the vLLM serving backend, with a batch size of 8 chosen based on experiments to optimize for throughput. Use 0 to disable.", | ||
| cls=clickext.ConfigOption, | ||
| default=DEFAULTS.BATCH_SIZE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| default=DEFAULTS.BATCH_SIZE, |
removing this line and re-installing I get the following:
ilab data generate --debug-params
Parameters:
model_path: '/Users/charliedoern/.cache/instructlab/models/mistral-7b-instruct-v0.2.Q4_K_M.gguf' [type: str, src: default_map]
num_cpus: 10 [type: int, src: default_map]
chunk_word_count: 1000 [type: int, src: default_map]
num_instructions: -1 [type: int, src: default]
sdg_scale_factor: 30 [type: int, src: default_map]
taxonomy_path: '/Users/charliedoern/.local/share/instructlab/taxonomy' [type: str, src: default_map]
taxonomy_base: 'origin/main' [type: str, src: default_map]
output_dir: '/Users/charliedoern/.local/share/instructlab/datasets' [type: str, src: default_map]
quiet: False [type: bool, src: default]
endpoint_url: None [type: None, src: default]
api_key: 'no_api_key' [type: str, src: default]
yaml_rules: None [type: None, src: default]
server_ctx_size: 4096 [type: int, src: default]
tls_insecure: False [type: bool, src: default]
tls_client_cert: '' [type: str, src: default]
tls_client_key: '' [type: str, src: default]
tls_client_passwd: '' [type: str, src: default]
model_family: None [type: None, src: default]
pipeline: 'full' [type: str, src: default_map]
batch_size: 8 [type: int, src: default_map]
enable_serving_output: False [type: bool, src: default]
gpus: None [type: None, src: default_map]
max_num_tokens: 4096 [type: int, src: default_map]
detached: False [type: bool, src: default]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks it indeed worked after I reinstalled it
|
@Mergifyio rebase |
✅ Branch has been successfully rebased |
f4e8c50 to
bb2eb90
Compare
eshwarprasadS
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚀
Signed-off-by: Nikhil Palaskar <[email protected]>
bb2eb90 to
4a309ce
Compare
|
@mergify backport release-v0.24 |
✅ Backports have been createdDetails
|
|
@mergify backport release-v0.23 |
✅ Backports have been createdDetails
|
…-3157 Make SDG batch size configurable via system profile (backport #3157)
…3207) Currently, the batch size for SDG is only configurable via the CLI, but a single batch size across all hardware profiles is not optimal. Different hardware configurations have varying capabilities, and using a fixed batch size can lead to under-utilization or over-utilization of resources during the SDG process. To ensure efficient performance across different hardware, we should set the batch sizes independently in each system profile. <hr>This is an automatic backport of pull request #3157 done by [Mergify](https://mergify.com). Approved-by: courtneypacheco Approved-by: ktdreyer
Currently, the batch size for SDG is only configurable via the CLI, but a single batch size across all hardware profiles is not optimal. Different hardware configurations have varying capabilities, and using a fixed batch size can lead to under-utilization or over-utilization of resources during the SDG process.
To ensure efficient performance across different hardware, we should set the batch sizes independently in each system profile.