Codestin Search App

master-d411968

opencl : support k-quants (ggml-org#1836)

* Porting q2_k kernel to OpenCL

* Set global and local sizes for kernel calls for dequantizing k-quants

* Added q6_k kernel

* Fix q4_k opencl struct order

* Replace uchar with uint8_t

* Finish dequant kernels

* Added OpenCL DMMV kernels

* Fix q2_k, improve code

* Fix q3_k

* Shorten switch statements

* Improve code formatting

---------

Co-authored-by: Concedo <[email protected]>

Jun 16, 2023
d411968
zip
tar.gz

master-b41b4ca

examples : add "simple" (ggml-org#1840)

* Create `simple.cpp`

* minimalist example `CMakeLists.txt`

* Update Makefile for minimalist example

* remove 273: Trailing whitespace

* removed trailing white spaces simple.cpp

* typo and comments simple.cpp

---------

Co-authored-by: Georgi Gerganov <[email protected]>

Jun 16, 2023
b41b4ca
zip
tar.gz

master-ac3b886

llama : fix embd when offloading non-repeating layers (ggml-org#1891)

Jun 16, 2023
ac3b886
zip
tar.gz

master-13fe9d2

cmake : add auto detection of BLAS_INCLUDE_DIRS (ggml-org#1886)

Jun 16, 2023
13fe9d2
zip
tar.gz

master-9cbf50c

build : fix and ignore MSVC warnings (ggml-org#1889)

Jun 16, 2023
9cbf50c
zip
tar.gz

master-5b9ccaf

Fixed possible macro redefinition (ggml-org#1892)

MinGW libstdc++ may define `NOMINMAX` unconditionally. This fixes the case when it is already defined.

Jun 16, 2023
5b9ccaf
zip
tar.gz

master-3d01122

CUDA : faster k-quant dot kernels (ggml-org#1862)

* cuda : faster k-quant dot kernels

* Imrove Q2_K dot kernel on older GPUs

We now have a K_QUANTS_PER_ITERATION macro, which should be
set to 1 on older and to 2 on newer GPUs.
With this, we preserve the performance of the original
PR on RTX-4080, and are faster compared to master on
GTX-1660.

* Imrove Q6_K dot kernel on older GPUs

Using the same K_QUANTS_PER_ITERATION macro as last commit,
we preserve performance on RTX-4080 and speed up
Q6_K on a GTX-1660.

* Add LLAMA_CUDA_KQUANTS_ITER to CMakeLists.txt and Makefile

Allowed values are 1 or 2. 2 gives the best performance on
modern GPUs and is set as default. On older GPUs 1 may work
better.

* PR comments

---------

Co-authored-by: Iwan Kawrakow <[email protected]>

Jun 16, 2023
3d01122
zip
tar.gz

master-cf267d1

make : add train-text-from-scratch (ggml-org#1850)

* make finetuning example accessible

* fixed: targed was in wrong line

* fixed: name of executable was wrong

* fixed: naming of binary

* fixed: model path was wrong

* fixed clean target

* Update examples/train-text-from-scratch/README.md

---------

Co-authored-by: Georgi Gerganov <[email protected]>

Jun 15, 2023
cf267d1
zip
tar.gz

master-c36e81d

examples : add chat-vicuna.sh (ggml-org#1854)

Co-authored-by: Yang Li <[email protected]>

Jun 15, 2023
c36e81d
zip
tar.gz

master-bed9275

cmake : remove whitespaces

Jun 15, 2023
bed9275
zip
tar.gz

PreviousNext

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

master-d411968

master-b41b4ca

master-ac3b886

master-13fe9d2

master-9cbf50c

master-5b9ccaf

master-3d01122

master-cf267d1

master-c36e81d

master-bed9275

Tags: robyngraf/llama.cpp