Thanks to visit codestin.com
Credit goes to github.com

Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions src/instructlab/cli/data/generate.py
Original file line number Diff line number Diff line change
Expand Up @@ -143,8 +143,8 @@
@click.option(
"--batch-size",
type=click.IntRange(min=0),
default=None,
help="Number of elements to process in each batch through the SDG pipeline. Enabled by default for the vLLM serving backend, with a batch size of 8 chosen based on experiments to optimize for throughput. Use 0 to disable.",
cls=clickext.ConfigOption,
default=DEFAULTS.BATCH_SIZE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the default comes from the config when you set cls=... so no default is necessary here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cdoern, hmm I think I am missing something. If I run ilab data generate --debug-params. I get all the values of the parameter correct except for the batch size for which I get
batch_size: None [type: None, src: default] (if I remove the default)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
default=DEFAULTS.BATCH_SIZE,

removing this line and re-installing I get the following:

ilab data generate --debug-params
Parameters:
             model_path: '/Users/charliedoern/.cache/instructlab/models/mistral-7b-instruct-v0.2.Q4_K_M.gguf' 	[type: str, src: default_map]
               num_cpus: 10       	[type: int, src: default_map]
       chunk_word_count: 1000     	[type: int, src: default_map]
       num_instructions: -1       	[type: int, src: default]
       sdg_scale_factor: 30       	[type: int, src: default_map]
          taxonomy_path: '/Users/charliedoern/.local/share/instructlab/taxonomy' 	[type: str, src: default_map]
          taxonomy_base: 'origin/main' 	[type: str, src: default_map]
             output_dir: '/Users/charliedoern/.local/share/instructlab/datasets' 	[type: str, src: default_map]
                  quiet: False    	[type: bool, src: default]
           endpoint_url: None     	[type: None, src: default]
                api_key: 'no_api_key' 	[type: str, src: default]
             yaml_rules: None     	[type: None, src: default]
        server_ctx_size: 4096     	[type: int, src: default]
           tls_insecure: False    	[type: bool, src: default]
        tls_client_cert: ''       	[type: str, src: default]
         tls_client_key: ''       	[type: str, src: default]
      tls_client_passwd: ''       	[type: str, src: default]
           model_family: None     	[type: None, src: default]
               pipeline: 'full'   	[type: str, src: default_map]
             batch_size: 8        	[type: int, src: default_map]
  enable_serving_output: False    	[type: bool, src: default]
                   gpus: None     	[type: None, src: default_map]
         max_num_tokens: 4096     	[type: int, src: default_map]
               detached: False    	[type: bool, src: default]

Copy link
Contributor Author

@npalaska npalaska Feb 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks it indeed worked after I reinstalled it

)
@click.option(
"--enable-serving-output",
Expand Down
4 changes: 4 additions & 0 deletions src/instructlab/configuration.py
Original file line number Diff line number Diff line change
Expand Up @@ -373,6 +373,10 @@ class _generate(BaseModel):
default=DEFAULTS.NUM_CPUS,
description="Number of CPU cores to use for generation.",
)
batch_size: PositiveInt = Field(
default=DEFAULTS.BATCH_SIZE,
description="Number of Batches to send for generation on each core.",
)
chunk_word_count: PositiveInt = Field(
default=DEFAULTS.CHUNK_WORD_COUNT,
description="Maximum number of words per chunk.",
Expand Down
2 changes: 2 additions & 0 deletions src/instructlab/defaults.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,8 @@ class _InstructlabDefaults:
MAX_CONTEXT_SIZE = 4096
# TODO: these constants should be removed, they should not leak out
NUM_CPUS = 10
# Number of batches to send on each core. Tune the batch size to optimize the vLLM performance
BATCH_SIZE = 8
CHUNK_WORD_COUNT = 1000
CONNECTION_TIMEOUT = httpx.Timeout(timeout=30.0)
# use spawn start method, fork is not thread-safe
Expand Down
2 changes: 2 additions & 0 deletions src/instructlab/profiles/amd/cpu.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,8 @@ generate:
model: ~/.cache/instructlab/models/mistral-7b-instruct-v0.2.Q4_K_M.gguf
# Number of CPU cores to use for generation
num_cpus: 10
# Number of batches to send on each core
batch_size: 8
# Directory where generated datasets are stored
output_dir: ~/.local/share/instructlab/datasets
# Directory where pipeline config files are stored
Expand Down
2 changes: 2 additions & 0 deletions src/instructlab/profiles/apple/m1/m1_max.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,8 @@ generate:
model: ~/.cache/instructlab/models/mistral-7b-instruct-v0.2.Q4_K_M.gguf
# Number of CPU cores to use for generation
num_cpus: 10
# Number of batches to send on each core
batch_size: 8
# Directory where generated datasets are stored
output_dir: ~/.local/share/instructlab/datasets
# Directory where pipeline config files are stored
Expand Down
2 changes: 2 additions & 0 deletions src/instructlab/profiles/apple/m1/m1_ultra.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,8 @@ generate:
model: ~/.cache/instructlab/models/mistral-7b-instruct-v0.2.Q4_K_M.gguf
# Number of CPU cores to use for generation
num_cpus: 10
# Number of batches to send on each core
batch_size: 8
# Directory where generated datasets are stored
output_dir: ~/.local/share/instructlab/datasets
# Directory where pipeline config files are stored
Expand Down
2 changes: 2 additions & 0 deletions src/instructlab/profiles/apple/m2/m2.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,8 @@ generate:
model: ~/.cache/instructlab/models/mistral-7b-instruct-v0.2.Q4_K_M.gguf
# Number of CPU cores to use for generation
num_cpus: 10
# Number of batches to send on each core
batch_size: 8
# Directory where generated datasets are stored
output_dir: ~/.local/share/instructlab/datasets
# Directory where pipeline config files are stored
Expand Down
2 changes: 2 additions & 0 deletions src/instructlab/profiles/apple/m2/m2_max.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,8 @@ generate:
model: ~/.cache/instructlab/models/mistral-7b-instruct-v0.2.Q4_K_M.gguf
# Number of CPU cores to use for generation
num_cpus: 10
# Number of batches to send on each core
batch_size: 8
# Directory where generated datasets are stored
output_dir: ~/.local/share/instructlab/datasets
# Directory where pipeline config files are stored
Expand Down
2 changes: 2 additions & 0 deletions src/instructlab/profiles/apple/m2/m2_pro.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,8 @@ generate:
model: ~/.cache/instructlab/models/mistral-7b-instruct-v0.2.Q4_K_M.gguf
# Number of CPU cores to use for generation
num_cpus: 10
# Number of batches to send on each core
batch_size: 8
# Directory where generated datasets are stored
output_dir: ~/.local/share/instructlab/datasets
# Directory where pipeline config files are stored
Expand Down
2 changes: 2 additions & 0 deletions src/instructlab/profiles/apple/m2/m2_ultra.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,8 @@ generate:
model: ~/.cache/instructlab/models/mistral-7b-instruct-v0.2.Q4_K_M.gguf
# Number of CPU cores to use for generation
num_cpus: 10
# Number of batches to send on each core
batch_size: 8
# Directory where generated datasets are stored
output_dir: ~/.local/share/instructlab/datasets
# Directory where pipeline config files are stored
Expand Down
2 changes: 2 additions & 0 deletions src/instructlab/profiles/apple/m3/m3.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,8 @@ generate:
model: ~/.cache/instructlab/models/mistral-7b-instruct-v0.2.Q4_K_M.gguf
# Number of CPU cores to use for generation
num_cpus: 10
# Number of batches to send on each core
batch_size: 8
# Directory where generated datasets are stored
output_dir: ~/.local/share/instructlab/datasets
# Directory where pipeline config files are stored
Expand Down
2 changes: 2 additions & 0 deletions src/instructlab/profiles/apple/m3/m3_max.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,8 @@ generate:
model: ~/.cache/instructlab/models/mistral-7b-instruct-v0.2.Q4_K_M.gguf
# Number of CPU cores to use for generation
num_cpus: 10
# Number of batches to send on each core
batch_size: 8
# Directory where generated datasets are stored
output_dir: ~/.local/share/instructlab/datasets
# Directory where pipeline config files are stored
Expand Down
2 changes: 2 additions & 0 deletions src/instructlab/profiles/apple/m3/m3_pro.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,8 @@ generate:
model: ~/.cache/instructlab/models/mistral-7b-instruct-v0.2.Q4_K_M.gguf
# Number of CPU cores to use for generation
num_cpus: 10
# Number of batches to send on each core
batch_size: 8
# Directory where generated datasets are stored
output_dir: ~/.local/share/instructlab/datasets
# Directory where pipeline config files are stored
Expand Down
2 changes: 2 additions & 0 deletions src/instructlab/profiles/intel/cpu.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,8 @@ generate:
model: ~/.cache/instructlab/models/mistral-7b-instruct-v0.2.Q4_K_M.gguf
# Number of CPU cores to use for generation
num_cpus: 10
# Number of batches to send on each core
batch_size: 8
# Directory where generated datasets are stored
output_dir: ~/.local/share/instructlab/datasets
# Directory where pipeline config files are stored
Expand Down
2 changes: 2 additions & 0 deletions src/instructlab/profiles/intel/gaudi/gaudi_3.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,8 @@ generate:
model: ~/.cache/instructlab/models/mistralai/Mixtral-8x7B-Instruct-v0.1
# Number of CPU cores to use for generation
num_cpus: 10
# Number of batches to send on each core
batch_size: 8
# Directory where generated datasets are stored
output_dir: ~/.local/share/instructlab/datasets
# Directory where pipeline config files are stored
Expand Down
2 changes: 2 additions & 0 deletions src/instructlab/profiles/nvidia/a100/a100_x2.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,8 @@ generate:
model: ~/.cache/instructlab/models/mistralai/Mixtral-8x7B-Instruct-v0.1
# Number of CPU cores to use for generation
num_cpus: 10
# Number of batches to send on each core
batch_size: 8
# Directory where generated datasets are stored
output_dir: ~/.local/share/instructlab/datasets
# Directory where pipeline config files are stored
Expand Down
2 changes: 2 additions & 0 deletions src/instructlab/profiles/nvidia/a100/a100_x4.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,8 @@ generate:
model: ~/.cache/instructlab/models/mistralai/Mixtral-8x7B-Instruct-v0.1
# Number of CPU cores to use for generation
num_cpus: 10
# Number of batches to send on each core
batch_size: 8
# Directory where generated datasets are stored
output_dir: ~/.local/share/instructlab/datasets
# Directory where pipeline config files are stored
Expand Down
2 changes: 2 additions & 0 deletions src/instructlab/profiles/nvidia/a100/a100_x8.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,8 @@ generate:
model: ~/.cache/instructlab/models/mistralai/Mixtral-8x7B-Instruct-v0.1
# Number of CPU cores to use for generation
num_cpus: 10
# Number of batches to send on each core
batch_size: 8
# Directory where generated datasets are stored
output_dir: ~/.local/share/instructlab/datasets
# Directory where pipeline config files are stored
Expand Down
2 changes: 2 additions & 0 deletions src/instructlab/profiles/nvidia/h100/h100_x2.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,8 @@ generate:
model: ~/.cache/instructlab/models/mistralai/Mixtral-8x7B-Instruct-v0.1
# Number of CPU cores to use for generation
num_cpus: 10
# Number of batches to send on each core
batch_size: 8
# Directory where generated datasets are stored
output_dir: ~/.local/share/instructlab/datasets
# Directory where pipeline config files are stored
Expand Down
2 changes: 2 additions & 0 deletions src/instructlab/profiles/nvidia/h100/h100_x4.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,8 @@ generate:
model: ~/.cache/instructlab/models/mistralai/Mixtral-8x7B-Instruct-v0.1
# Number of CPU cores to use for generation
num_cpus: 10
# Number of batches to send on each core
batch_size: 8
# Directory where generated datasets are stored
output_dir: ~/.local/share/instructlab/datasets
# Directory where pipeline config files are stored
Expand Down
2 changes: 2 additions & 0 deletions src/instructlab/profiles/nvidia/h100/h100_x8.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,8 @@ generate:
model: ~/.cache/instructlab/models/mistralai/Mixtral-8x7B-Instruct-v0.1
# Number of CPU cores to use for generation
num_cpus: 10
# Number of batches to send on each core
batch_size: 8
# Directory where generated datasets are stored
output_dir: ~/.local/share/instructlab/datasets
# Directory where pipeline config files are stored
Expand Down
2 changes: 2 additions & 0 deletions src/instructlab/profiles/nvidia/l4/l4_x8.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,8 @@ generate:
model: ~/.cache/instructlab/models/mistralai/Mixtral-8x7B-Instruct-v0.1
# Number of CPU cores to use for generation
num_cpus: 10
# Number of batches to send on each core
batch_size: 8
# Directory where generated datasets are stored
output_dir: ~/.local/share/instructlab/datasets
# Directory where pipeline config files are stored
Expand Down
2 changes: 2 additions & 0 deletions src/instructlab/profiles/nvidia/l40s/l40s_x4.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,8 @@ generate:
model: ~/.cache/instructlab/models/mistralai/Mixtral-8x7B-Instruct-v0.1
# Number of CPU cores to use for generation
num_cpus: 10
# Number of batches to send on each core
batch_size: 8
# Directory where generated datasets are stored
output_dir: ~/.local/share/instructlab/datasets
# Directory where pipeline config files are stored
Expand Down
2 changes: 2 additions & 0 deletions src/instructlab/profiles/nvidia/l40s/l40s_x8.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,8 @@ generate:
model: ~/.cache/instructlab/models/mistralai/Mixtral-8x7B-Instruct-v0.1
# Number of CPU cores to use for generation
num_cpus: 10
# Number of batches to send on each core
batch_size: 8
# Directory where generated datasets are stored
output_dir: ~/.local/share/instructlab/datasets
# Directory where pipeline config files are stored
Expand Down
1 change: 1 addition & 0 deletions tests/test_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@ def _assert_defaults(self, cfg: config.Config):
assert cfg.generate.taxonomy_path == f"{data_dir}/taxonomy"
assert cfg.generate.taxonomy_base == "origin/main"
assert cfg.generate.num_cpus == 10
assert cfg.generate.batch_size == 8
assert cfg.generate.sdg_scale_factor == 30
assert cfg.generate.chunk_word_count == 1000
assert cfg.generate.output_dir == f"{data_dir}/datasets"
Expand Down
3 changes: 3 additions & 0 deletions tests/testdata/default_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,9 @@ generate:
# Number of CPU cores to use for generation.
# Default: 10
num_cpus: 10
# Number of batches to send on each core
# Default: 8
batch_size: 8
# Number of instructions to use
# Default: -1
# Deprecated: see 'sdg_scale_factor' instead
Expand Down
Loading