You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: llama_cpp/llama.py
+1-1
Original file line number
Diff line number
Diff line change
@@ -153,7 +153,7 @@ def __init__(
153
153
model_path: Path to the model.
154
154
n_gpu_layers: Number of layers to offload to GPU (-ngl). If -1, all layers are offloaded.
155
155
split_mode: How to split the model across GPUs. See llama_cpp.LLAMA_SPLIT_* for options.
156
-
main_gpu: main_gpu interpretation depends on split_mode: LLAMA_SPLIT_NONE: the GPU that is used for the entire model. LLAMA_SPLIT_ROW: the GPU that is used for small tensors and intermediate results. LLAMA_SPLIT_LAYER: ignored
156
+
main_gpu: main_gpu interpretation depends on split_mode: LLAMA_SPLIT_MODE_NONE: the GPU that is used for the entire model. LLAMA_SPLIT_MODE_ROW: the GPU that is used for small tensors and intermediate results. LLAMA_SPLIT_MODE_LAYER: ignored
157
157
tensor_split: How split tensors should be distributed across GPUs. If None, the model is not split.
158
158
rpc_servers: Comma separated list of RPC servers to use for offloading
0 commit comments