Tags: robyngraf/llama.cpp
Tags
opencl : support k-quants (ggml-org#1836) * Porting q2_k kernel to OpenCL * Set global and local sizes for kernel calls for dequantizing k-quants * Added q6_k kernel * Fix q4_k opencl struct order * Replace uchar with uint8_t * Finish dequant kernels * Added OpenCL DMMV kernels * Fix q2_k, improve code * Fix q3_k * Shorten switch statements * Improve code formatting --------- Co-authored-by: Concedo <[email protected]>
examples : add "simple" (ggml-org#1840) * Create `simple.cpp` * minimalist example `CMakeLists.txt` * Update Makefile for minimalist example * remove 273: Trailing whitespace * removed trailing white spaces simple.cpp * typo and comments simple.cpp --------- Co-authored-by: Georgi Gerganov <[email protected]>
llama : fix embd when offloading non-repeating layers (ggml-org#1891)
cmake : add auto detection of BLAS_INCLUDE_DIRS (ggml-org#1886)
Fixed possible macro redefinition (ggml-org#1892) MinGW libstdc++ may define `NOMINMAX` unconditionally. This fixes the case when it is already defined.
CUDA : faster k-quant dot kernels (ggml-org#1862) * cuda : faster k-quant dot kernels * Imrove Q2_K dot kernel on older GPUs We now have a K_QUANTS_PER_ITERATION macro, which should be set to 1 on older and to 2 on newer GPUs. With this, we preserve the performance of the original PR on RTX-4080, and are faster compared to master on GTX-1660. * Imrove Q6_K dot kernel on older GPUs Using the same K_QUANTS_PER_ITERATION macro as last commit, we preserve performance on RTX-4080 and speed up Q6_K on a GTX-1660. * Add LLAMA_CUDA_KQUANTS_ITER to CMakeLists.txt and Makefile Allowed values are 1 or 2. 2 gives the best performance on modern GPUs and is set as default. On older GPUs 1 may work better. * PR comments --------- Co-authored-by: Iwan Kawrakow <[email protected]>
make : add train-text-from-scratch (ggml-org#1850) * make finetuning example accessible * fixed: targed was in wrong line * fixed: name of executable was wrong * fixed: naming of binary * fixed: model path was wrong * fixed clean target * Update examples/train-text-from-scratch/README.md --------- Co-authored-by: Georgi Gerganov <[email protected]>
examples : add chat-vicuna.sh (ggml-org#1854) Co-authored-by: Yang Li <[email protected]>
PreviousNext