Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[pull] master from ggml-org:master #407

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 96 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
96 commits
Select commit Hold shift + click to select a range
3ac6753
llama-graph : use ggml_repeat_4d (#13998)
ngxson Jun 4, 2025
4825487
releases : use dl backend for linux release, remove arm64 linux relea…
slaren Jun 4, 2025
2589ad3
ci : remove cuda 11.7 releases, switch runner to windows 2022 (#13997)
slaren Jun 4, 2025
3e63a58
kv-cache : refactor the update/defrag mechanism (#13988)
ggerganov Jun 4, 2025
0d39844
ggml-vulkan: adds support for op CONV_TRANSPOSE_1D (#13813)
etasnadi Jun 4, 2025
5a8ae30
vulkan: automatically deduce size of push constants (#13936)
jeffbolznv Jun 5, 2025
9e31bec
context : fix pos_min initialization upon error decode (#14008)
ggerganov Jun 5, 2025
9f47fa5
vocab : warn about missing mask token (#14022)
CISC Jun 5, 2025
d01d112
readme : add badge (#13938)
Olexandr88 Jun 5, 2025
3a07714
llama : allow using mmap without PrefetchVirtualMemory, apply GGML_WI…
slaren Jun 5, 2025
7f37b6c
memory : migrate from llama_kv_cache to more generic llama_memory (#1…
ggerganov Jun 5, 2025
146b88e
ci: fix CUDA build failure on autodl cloud machines (#14005)
pockers21 Jun 5, 2025
669c13e
vulkan: Enable VK_KHR_cooperative_matrix extension for Intel Xe2 GPUs…
rillomas Jun 5, 2025
1caae7f
gguf-py : add add_classifier_output_labels method to writer (#14031)
CISC Jun 5, 2025
d17a809
llama : support multiple classifier outputs and labels (#13940)
CISC Jun 6, 2025
487a5e0
context : fix SWA-related warning for multiple sequences (#14045)
ggerganov Jun 6, 2025
745aa53
llama : deprecate llama_kv_self_ API (#14030)
ggerganov Jun 6, 2025
0974ad7
llama : fix llama_model_chat_template with template name (LLM_KV with…
CISC Jun 7, 2025
228f34c
SYCL: Implement few same quantized type copy kernels (#13739)
qnixsynapse Jun 7, 2025
5787b5d
ci: add LoongArch cross-compile build (#13944)
wojiushixiaobai Jun 7, 2025
247e5c6
cuda : fix buffer type check with integrated GPUs (#14069)
slaren Jun 8, 2025
056eb74
CANN: Enable labeler for Ascend NPU (#13914)
shink Jun 9, 2025
91a8ee6
add geglu activation function (#14074)
huydt84 Jun 9, 2025
b460d16
sycl: Add reorder to Q6_K mmvq implementation (#13885)
s-Nick Jun 9, 2025
87d34b3
server : fix LRU check (#14079)
ggerganov Jun 9, 2025
dc0623f
webui: fix sidebar being covered by main content (#14082)
yeahdongcn Jun 9, 2025
e21d2d4
CANN: Simplify the environment variable setting(#13104)
bachelor-dou Jun 9, 2025
201b31d
graph : fix geglu (#14077)
ggerganov Jun 9, 2025
8f47e25
cuda : fix device sync on buffer clear (#14033)
slaren Jun 9, 2025
f470bc3
ggml-cpu : split arch-specific implementations (#13892)
xctan Jun 9, 2025
7f4fbe5
llama : allow building all tests on windows when not using shared lib…
slaren Jun 9, 2025
40cbf57
kv-cache : fix shift and defrag logic (#14081)
ggerganov Jun 9, 2025
1f63e75
metal : use less stack memory in FA kernel (#14088)
ggerganov Jun 9, 2025
1a3b5e8
Add in-build ggml::ggml ALIAS library (ggml/1260)
dg0yt Jun 3, 2025
b8e2194
sync : ggml
ggerganov Jun 10, 2025
2bb0467
rpc : nicer error messages for RPC server crash (#14076)
isaac-mcfadyen Jun 10, 2025
97340b4
Vulkan: Don't default to CPU device (like llvmpipe), even if no other…
0cc4m Jun 10, 2025
b7ce1ad
ggml : fix weak alias win32 (whisper/0)
ggerganov Jun 10, 2025
ae92c18
sync : ggml
ggerganov Jun 10, 2025
3a12db2
Fixed spec timings to: accepted/tested instead of accepted/drafted (#…
jukofyork Jun 10, 2025
652b70e
vulkan: force device 0 in CI (#14106)
jeffbolznv Jun 10, 2025
3678b83
llama : support GEGLU for jina-bert-v2 (#14090)
CISC Jun 10, 2025
55f6b9f
convert : fix duplicate key DeepSeek-R1 conversion error (#14103)
CISC Jun 10, 2025
dad5c44
kv-cache : avoid modifying recurrent cells when setting inputs (#13834)
compilade Jun 10, 2025
4c763c8
opencl: add `mul_mv_id_q4_0_f32_8x_flat` (#14003)
lhez Jun 10, 2025
1f7d50b
vulkan: Track descriptor pools/sets per-context (#14109)
jeffbolznv Jun 11, 2025
7ae2932
kv-cache : add LLAMA_KV_CACHE_DEBUG environment variable (#14121)
ggerganov Jun 11, 2025
2baf077
server : pass default --keep argument (#14120)
MightyAlex200 Jun 11, 2025
89a184f
kv-cache : relax SWA masking condition (#14119)
ggerganov Jun 11, 2025
7781e5f
webui: Wrap long numbers instead of infinite horizontal scroll (#14062)
am17an Jun 11, 2025
bd248d4
vulkan: Better thread-safety for command pools/buffers (#14116)
jeffbolznv Jun 11, 2025
cc66a7f
tests : add test-tokenizers-repo (#14017)
CISC Jun 11, 2025
d4e0d95
chore : clean up relative source dir paths (#14128)
CISC Jun 11, 2025
532802f
Implement GGML_CPU_ALL_VARIANTS for ARM (#14080)
ckastner Jun 11, 2025
2e89f76
common: fix issue with regex_escape routine on windows (#14133)
bandoti Jun 11, 2025
a20b2b0
context : round n_tokens to next multiple of n_seqs when reserving (#…
compilade Jun 12, 2025
9596506
kv-cache : fix split_equal handling in unified implementation (#14130)
ggerganov Jun 12, 2025
e2c0b6e
cmake : handle whitepsaces in path during metal build (#14126)
ggerganov Jun 12, 2025
c3ee46f
batch : remove logits_all flag (#14141)
ggerganov Jun 12, 2025
f6e1a7a
context : simplify output counting logic during decode (#14142)
ggerganov Jun 12, 2025
7d51644
server : re-enable SWA speculative decoding (#14131)
ggerganov Jun 12, 2025
a681b4b
readme : remove project status link (#14149)
ggerganov Jun 12, 2025
ed52f36
sycl: Remove not needed copy f16->f32 for dnnl mul mat (#14125)
ShanoToni Jun 12, 2025
c33fe8b
vocab : prevent heap overflow when vocab is too small (#14145)
ggerganov Jun 13, 2025
09cf2c7
cmake : Improve build-info.cpp generation (#14156)
ckastner Jun 13, 2025
c61285e
SYCL: Bump oneMath commit (#14152)
EwanC Jun 13, 2025
0889eba
sycl: Adding additional cpy dbg print output (#14034)
ShanoToni Jun 13, 2025
ffad043
server : fix SWA condition for full context reprocess (#14163)
ggerganov Jun 13, 2025
d714dad
pooling : make cls_b and cls_out_b optional (#14165)
huydt84 Jun 13, 2025
cc8d081
cmake: Add ability to pass in LLAMA_BUILD_NUMBER/COMMIT (#14167)
ckastner Jun 13, 2025
b7cc774
readme : remove survey link (#14168)
ggerganov Jun 13, 2025
60c6663
batch : rework llama_batch_allocr (#14153)
ggerganov Jun 13, 2025
26ff368
docs : Update multimodal.md (#14122)
ddpasa Jun 13, 2025
80709b7
batch : add LLAMA_BATCH_DEBUG environment variable (#14172)
ggerganov Jun 13, 2025
3cfbbdb
Merge commit from fork
GuyGoldenberg Jun 13, 2025
40643ed
sycl: fix docker image (#14144)
sgeor255 Jun 13, 2025
fb85a28
vocab : fix build (#14175)
ggerganov Jun 13, 2025
2e42be4
compare-llama-bench: add option to plot (#14169)
am17an Jun 14, 2025
3cb203c
llama-chat : Do not throw when tool parsing fails (#14012)
p1-0tr Jun 14, 2025
00ba772
docs : remove WIP since PR has been merged (#13912)
pepijndevos Jun 15, 2025
b9912ac
batch : auto-gen positions + verify multi-sequence input (#14177)
ggerganov Jun 15, 2025
c311ac6
cparams : rename LLAMA_MAX_PARALLEL_SEQUENCES to LLAMA_MAX_SEQ (#14188)
ggerganov Jun 15, 2025
9ae4143
model : add dots.llm1 architecture support (#14044) (#14118)
Noeda Jun 15, 2025
5fce5f9
kv-cache : fix use-after-move of defrag info (#14189)
ggerganov Jun 15, 2025
2c2caa4
HIP: Replace usage of depricated preprocessor macro __AMDGCN_WAVEFRON…
IMbackK Jun 15, 2025
e54b394
CUDA/HIP: fix ssm_scan on devices where warp size is not 32 (#14196)
IMbackK Jun 15, 2025
30e5b01
quantize : change int to unsigned int for KV overrides (#14197)
EAddario Jun 15, 2025
cd355ed
server : When listening on a unix domain socket don't print http:// a…
ericcurtin Jun 15, 2025
d7da8dc
model : Add support for Arcee AI's upcoming AFM model (#14185)
bartowski1182 Jun 15, 2025
3555b30
ggml-cpu : rework weak alias on apple targets (#14146)
xctan Jun 16, 2025
c89c2d1
vulkan: mutex around vkQueueSubmit (#14127)
jeffbolznv Jun 16, 2025
4ad2436
gguf-py : allow key override when adding value to GGUFWriter (#14194)
huydt84 Jun 16, 2025
0bf49eb
convert : remove arcee change in convert_hf_to_gguf_update.py (#14207)
bartowski1182 Jun 16, 2025
3ba0d84
ggml: Add Android support for GGML_CPU_ALL_VARIANTS (#14206)
chaxu01 Jun 16, 2025
d3e64b9
llama : rework embeddings logic (#14208)
ggerganov Jun 16, 2025
7d6d91b
HIP: disable rocwmma on gfx12 by default until rocm 7.0 (#14202)
IMbackK Jun 16, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 17 additions & 13 deletions .devops/intel.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -49,19 +49,23 @@ COPY --from=build /app/full /app

WORKDIR /app

RUN apt-get update \
&& apt-get install -y \
git \
python3 \
python3-pip \
&& pip install --upgrade pip setuptools wheel \
&& pip install -r requirements.txt \
&& apt autoremove -y \
&& apt clean -y \
&& rm -rf /tmp/* /var/tmp/* \
&& find /var/cache/apt/archives /var/lib/apt/lists -not -name lock -type f -delete \
&& find /var/cache -type f -delete

RUN apt-get update && \
apt-get install -y \
git \
python3 \
python3-pip \
python3-venv && \
python3 -m venv /opt/venv && \
. /opt/venv/bin/activate && \
pip install --upgrade pip setuptools wheel && \
pip install -r requirements.txt && \
apt autoremove -y && \
apt clean -y && \
rm -rf /tmp/* /var/tmp/* && \
find /var/cache/apt/archives /var/lib/apt/lists -not -name lock -type f -delete && \
find /var/cache -type f -delete

ENV PATH="/opt/venv/bin:$PATH"

ENTRYPOINT ["/app/tools.sh"]

Expand Down
7 changes: 7 additions & 0 deletions .github/labeler.yml
Original file line number Diff line number Diff line change
Expand Up @@ -86,3 +86,10 @@ nix:
embedding:
- changed-files:
- any-glob-to-any-file: examples/embedding/

Ascend NPU:
- changed-files:
- any-glob-to-any-file:
- ggml/include/ggml-cann.h
- ggml/src/ggml-cann/**
- docs/backend/CANN.md
113 changes: 113 additions & 0 deletions .github/workflows/build-linux-cross.yml
Original file line number Diff line number Diff line change
Expand Up @@ -231,3 +231,116 @@ jobs:
-DCMAKE_FIND_ROOT_PATH_MODE_INCLUDE=BOTH

cmake --build build --config Release -j $(nproc)

debian-13-loongarch64-cpu-cross:
runs-on: ubuntu-24.04
container: debian@sha256:653dfb9f86c3782e8369d5f7d29bb8faba1f4bff9025db46e807fa4c22903671

steps:
- uses: actions/checkout@v4
- name: Setup LoongArch
run: |
rm -f /etc/apt/sources.list.d/*
cat << EOF | tee /etc/apt/sources.list.d/debian-ports.list
deb http://snapshot.debian.org/archive/debian/20250515T202920Z/ trixie main
EOF
( echo 'quiet "true";'; \
echo 'APT::Get::Assume-Yes "true";'; \
echo 'APT::Install-Recommends "false";'; \
echo 'Acquire::Check-Valid-Until "false";'; \
echo 'Acquire::Retries "5";'; \
) > /etc/apt/apt.conf.d/99snapshot-repos

apt-get update
apt-get install -y ca-certificates debian-ports-archive-keyring cmake git zip
dpkg --add-architecture loong64

# Add arch-specific repositories for non-amd64 architectures
cat << EOF | tee /etc/apt/sources.list.d/loong64-ports.list
deb [arch=loong64] http://snapshot.debian.org/archive/debian-ports/20250515T194251Z/ sid main
EOF

apt-get update || true ;# Prevent failure due to missing URLs.

apt-get install -y --no-install-recommends \
build-essential \
gcc-14-loongarch64-linux-gnu \
g++-14-loongarch64-linux-gnu

- name: Build
run: |
cmake -B build -DLLAMA_CURL=OFF \
-DCMAKE_BUILD_TYPE=Release \
-DGGML_OPENMP=OFF \
-DLLAMA_BUILD_EXAMPLES=ON \
-DLLAMA_BUILD_TOOLS=ON \
-DLLAMA_BUILD_TESTS=OFF \
-DCMAKE_SYSTEM_NAME=Linux \
-DCMAKE_SYSTEM_PROCESSOR=loongarch64 \
-DCMAKE_C_COMPILER=loongarch64-linux-gnu-gcc-14 \
-DCMAKE_CXX_COMPILER=loongarch64-linux-gnu-g++-14 \
-DCMAKE_POSITION_INDEPENDENT_CODE=ON \
-DCMAKE_FIND_ROOT_PATH=/usr/lib/loongarch64-linux-gnu \
-DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER \
-DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY \
-DCMAKE_FIND_ROOT_PATH_MODE_INCLUDE=BOTH

cmake --build build --config Release -j $(nproc)

debian-13-loongarch64-vulkan-cross:
runs-on: ubuntu-24.04
container: debian@sha256:653dfb9f86c3782e8369d5f7d29bb8faba1f4bff9025db46e807fa4c22903671

steps:
- uses: actions/checkout@v4
- name: Setup LoongArch
run: |
rm -f /etc/apt/sources.list.d/*
cat << EOF | tee /etc/apt/sources.list.d/debian-ports.list
deb http://snapshot.debian.org/archive/debian/20250515T202920Z/ trixie main
EOF
( echo 'quiet "true";'; \
echo 'APT::Get::Assume-Yes "true";'; \
echo 'APT::Install-Recommends "false";'; \
echo 'Acquire::Check-Valid-Until "false";'; \
echo 'Acquire::Retries "5";'; \
) > /etc/apt/apt.conf.d/99snapshot-repos

apt-get update
apt-get install -y ca-certificates debian-ports-archive-keyring cmake git zip
dpkg --add-architecture loong64

# Add arch-specific repositories for non-amd64 architectures
cat << EOF | tee /etc/apt/sources.list.d/loong64-ports.list
deb [arch=loong64] http://snapshot.debian.org/archive/debian-ports/20250515T194251Z/ sid main
EOF

apt-get update || true ;# Prevent failure due to missing URLs.

apt-get install -y --no-install-recommends \
build-essential \
glslc \
gcc-14-loongarch64-linux-gnu \
g++-14-loongarch64-linux-gnu \
libvulkan-dev:loong64

- name: Build
run: |
cmake -B build -DLLAMA_CURL=OFF \
-DCMAKE_BUILD_TYPE=Release \
-DGGML_VULKAN=ON \
-DGGML_OPENMP=OFF \
-DLLAMA_BUILD_EXAMPLES=ON \
-DLLAMA_BUILD_TOOLS=ON \
-DLLAMA_BUILD_TESTS=OFF \
-DCMAKE_SYSTEM_NAME=Linux \
-DCMAKE_SYSTEM_PROCESSOR=loongarch64 \
-DCMAKE_C_COMPILER=loongarch64-linux-gnu-gcc-14 \
-DCMAKE_CXX_COMPILER=loongarch64-linux-gnu-g++-14 \
-DCMAKE_POSITION_INDEPENDENT_CODE=ON \
-DCMAKE_FIND_ROOT_PATH=/usr/lib/loongarch64-linux-gnu \
-DCMAKE_FIND_ROOT_PATH_MODE_PROGRAM=NEVER \
-DCMAKE_FIND_ROOT_PATH_MODE_LIBRARY=ONLY \
-DCMAKE_FIND_ROOT_PATH_MODE_INCLUDE=BOTH

cmake --build build --config Release -j $(nproc)
13 changes: 7 additions & 6 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -306,6 +306,7 @@ jobs:
id: cmake_test
run: |
cd build
export GGML_VK_VISIBLE_DEVICES=0
# This is using llvmpipe and runs slower than other backends
ctest -L main --verbose --timeout 3600

Expand Down Expand Up @@ -687,8 +688,8 @@ jobs:
strategy:
matrix:
include:
- build: 'cpu-x64'
defines: '-G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/x64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DGGML_OPENMP=OFF'
- build: 'cpu-x64 (static)'
defines: '-G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/x64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DBUILD_SHARED_LIBS=OFF'
- build: 'openblas-x64'
defines: '-G "Ninja Multi-Config" -D CMAKE_TOOLCHAIN_FILE=cmake/x64-windows-llvm.cmake -DGGML_NATIVE=OFF -DLLAMA_BUILD_SERVER=ON -DGGML_RPC=ON -DGGML_BACKEND_DL=ON -DGGML_CPU_ALL_VARIANTS=ON -DGGML_OPENMP=OFF -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS -DBLAS_INCLUDE_DIRS="$env:RUNNER_TEMP/openblas/include" -DBLAS_LIBRARIES="$env:RUNNER_TEMP/openblas/lib/openblas.lib"'
- build: 'vulkan-x64'
Expand Down Expand Up @@ -839,12 +840,12 @@ jobs:
-DGGML_CUDA=ON
cmake --build build

windows-2019-cmake-cuda:
runs-on: windows-2019
windows-2022-cmake-cuda:
runs-on: windows-2022

strategy:
matrix:
cuda: ['12.4', '11.7']
cuda: ['12.4']

steps:
- name: Clone
Expand Down Expand Up @@ -878,7 +879,7 @@ jobs:
env:
CURL_PATH: ${{ steps.get_libcurl.outputs.curl_path }}
run: |
call "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Auxiliary\Build\vcvars64.bat"
call "C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Auxiliary\Build\vcvarsall.bat" x64
cmake -S . -B build -G "Ninja Multi-Config" ^
-DLLAMA_BUILD_SERVER=ON ^
-DGGML_NATIVE=OFF ^
Expand Down
17 changes: 12 additions & 5 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -131,8 +131,9 @@ jobs:
include:
- build: 'x64'
os: ubuntu-22.04
- build: 'arm64'
os: ubuntu-22.04-arm
# GGML_BACKEND_DL and GGML_CPU_ALL_VARIANTS are not currently supported on arm
# - build: 'arm64'
# os: ubuntu-22.04-arm

runs-on: ${{ matrix.os }}

Expand All @@ -159,6 +160,9 @@ jobs:
id: cmake_build
run: |
cmake -B build \
-DGGML_BACKEND_DL=ON \
-DGGML_NATIVE=OFF \
-DGGML_CPU_ALL_VARIANTS=ON \
-DLLAMA_FATAL_WARNINGS=ON \
${{ env.CMAKE_ARGS }}
cmake --build build --config Release -j $(nproc)
Expand Down Expand Up @@ -207,6 +211,9 @@ jobs:
id: cmake_build
run: |
cmake -B build \
-DGGML_BACKEND_DL=ON \
-DGGML_NATIVE=OFF \
-DGGML_CPU_ALL_VARIANTS=ON \
-DGGML_VULKAN=ON \
${{ env.CMAKE_ARGS }}
cmake --build build --config Release -j $(nproc)
Expand Down Expand Up @@ -373,11 +380,11 @@ jobs:
name: llama-bin-win-${{ matrix.backend }}-${{ matrix.arch }}.zip

windows-cuda:
runs-on: windows-2019
runs-on: windows-2022

strategy:
matrix:
cuda: ['12.4', '11.7']
cuda: ['12.4']

steps:
- name: Clone
Expand Down Expand Up @@ -405,7 +412,7 @@ jobs:
id: cmake_build
shell: cmd
run: |
call "C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Auxiliary\Build\vcvars64.bat"
call "C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Auxiliary\Build\vcvarsall.bat" x64
cmake -S . -B build -G "Ninja Multi-Config" ^
-DGGML_BACKEND_DL=ON ^
-DGGML_NATIVE=OFF ^
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/server.yml
Original file line number Diff line number Diff line change
Expand Up @@ -180,7 +180,7 @@ jobs:


server-windows:
runs-on: windows-2019
runs-on: windows-2022

steps:
- name: Clone
Expand Down
19 changes: 15 additions & 4 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,14 @@ option(LLAMA_LLGUIDANCE "llama-common: include LLGuidance library for structured
include(${CMAKE_CURRENT_SOURCE_DIR}/cmake/build-info.cmake)
include(${CMAKE_CURRENT_SOURCE_DIR}/cmake/common.cmake)

if (NOT DEFINED LLAMA_BUILD_NUMBER)
set(LLAMA_BUILD_NUMBER ${BUILD_NUMBER})
endif()
if (NOT DEFINED LLAMA_BUILD_COMMIT)
set(LLAMA_BUILD_COMMIT ${BUILD_COMMIT})
endif()
set(LLAMA_INSTALL_VERSION 0.0.${BUILD_NUMBER})

# override ggml options
set(GGML_ALL_WARNINGS ${LLAMA_ALL_WARNINGS})
set(GGML_FATAL_WARNINGS ${LLAMA_FATAL_WARNINGS})
Expand Down Expand Up @@ -155,10 +163,17 @@ if (LLAMA_USE_SYSTEM_GGML)
endif()

if (NOT TARGET ggml AND NOT LLAMA_USE_SYSTEM_GGML)
set(GGML_BUILD_NUMBER ${LLAMA_BUILD_NUMBER})
set(GGML_BUILD_COMMIT ${LLAMA_BUILD_COMMIT})
add_subdirectory(ggml)
# ... otherwise assume ggml is added by a parent CMakeLists.txt
endif()

if (MINGW)
# Target Windows 8 for PrefetchVirtualMemory
add_compile_definitions(_WIN32_WINNT=${GGML_WIN_VER})
endif()

#
# build the library
#
Expand Down Expand Up @@ -199,10 +214,6 @@ endif()
include(GNUInstallDirs)
include(CMakePackageConfigHelpers)

set(LLAMA_BUILD_NUMBER ${BUILD_NUMBER})
set(LLAMA_BUILD_COMMIT ${BUILD_COMMIT})
set(LLAMA_INSTALL_VERSION 0.0.${BUILD_NUMBER})

set(LLAMA_INCLUDE_INSTALL_DIR ${CMAKE_INSTALL_INCLUDEDIR} CACHE PATH "Location of header files")
set(LLAMA_LIB_INSTALL_DIR ${CMAKE_INSTALL_LIBDIR} CACHE PATH "Location of library files")
set(LLAMA_BIN_INSTALL_DIR ${CMAKE_INSTALL_BINDIR} CACHE PATH "Location of binary files")
Expand Down
4 changes: 2 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -367,7 +367,7 @@ ifdef LLAMA_SERVER_SSL
endif

ifndef GGML_NO_CPU_AARCH64
MK_CPPFLAGS += -DGGML_USE_CPU_AARCH64
MK_CPPFLAGS += -DGGML_USE_CPU_REPACK
endif

# warnings
Expand Down Expand Up @@ -970,7 +970,7 @@ OBJ_GGML = \
$(DIR_GGML)/src/ggml-threading.o \
$(DIR_GGML)/src/ggml-cpu/ggml-cpu.o \
$(DIR_GGML)/src/ggml-cpu/ggml-cpu_cpp.o \
$(DIR_GGML)/src/ggml-cpu/ggml-cpu-aarch64.o \
$(DIR_GGML)/src/ggml-cpu/repack.o \
$(DIR_GGML)/src/ggml-cpu/ggml-cpu-hbm.o \
$(DIR_GGML)/src/ggml-cpu/ggml-cpu-quants.o \
$(DIR_GGML)/src/ggml-cpu/ggml-cpu-traits.o \
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,10 @@
![llama](https://user-images.githubusercontent.com/1991296/230134379-7181e485-c521-4d23-a0d6-f7b3b61ba524.png)

[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](https://opensource.org/licenses/MIT)
[![Release](https://img.shields.io/github/v/release/ggml-org/llama.cpp)](https://github.com/ggml-org/llama.cpp/releases)
[![Server](https://github.com/ggml-org/llama.cpp/actions/workflows/server.yml/badge.svg)](https://github.com/ggml-org/llama.cpp/actions/workflows/server.yml)

[Roadmap](https://github.com/users/ggerganov/projects/7) / [Project status](https://github.com/ggml-org/llama.cpp/discussions/3471) / [Manifesto](https://github.com/ggml-org/llama.cpp/discussions/205) / [ggml](https://github.com/ggml-org/ggml)
[Roadmap](https://github.com/users/ggerganov/projects/7) / [Manifesto](https://github.com/ggml-org/llama.cpp/discussions/205) / [ggml](https://github.com/ggml-org/ggml)

Inference of Meta's [LLaMA](https://arxiv.org/abs/2302.13971) model (and others) in pure C/C++

Expand All @@ -17,7 +18,6 @@ Inference of Meta's [LLaMA](https://arxiv.org/abs/2302.13971) model (and others)
## Hot topics

- πŸ”₯ Multimodal support arrived in `llama-server`: [#12898](https://github.com/ggml-org/llama.cpp/pull/12898) | [documentation](./docs/multimodal.md)
- **GGML developer experience survey (organized and reviewed by NVIDIA):** [link](https://forms.gle/Gasw3cRgyhNEnrwK9)
- A new binary `llama-mtmd-cli` is introduced to replace `llava-cli`, `minicpmv-cli`, `gemma3-cli` ([#13012](https://github.com/ggml-org/llama.cpp/pull/13012)) and `qwen2vl-cli` ([#13141](https://github.com/ggml-org/llama.cpp/pull/13141)), `libllava` will be deprecated
- VS Code extension for FIM completions: https://github.com/ggml-org/llama.vscode
- Universal [tool call support](./docs/function-calling.md) in `llama-server` https://github.com/ggml-org/llama.cpp/pull/9639
Expand Down
15 changes: 14 additions & 1 deletion ci/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,20 @@ if [ ! -z ${GG_BUILD_METAL} ]; then
fi

if [ ! -z ${GG_BUILD_CUDA} ]; then
CMAKE_EXTRA="${CMAKE_EXTRA} -DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=native"
CMAKE_EXTRA="${CMAKE_EXTRA} -DGGML_CUDA=ON"

if command -v nvidia-smi >/dev/null 2>&1; then
CUDA_ARCH=$(nvidia-smi --query-gpu=compute_cap --format=csv,noheader,nounits 2>/dev/null | head -1 | tr -d '.')
if [[ -n "$CUDA_ARCH" && "$CUDA_ARCH" =~ ^[0-9]+$ ]]; then
CMAKE_EXTRA="${CMAKE_EXTRA} -DCMAKE_CUDA_ARCHITECTURES=${CUDA_ARCH}"
else
echo "Warning: Using fallback CUDA architectures"
CMAKE_EXTRA="${CMAKE_EXTRA} -DCMAKE_CUDA_ARCHITECTURES=61;70;75;80;86;89"
fi
else
echo "Error: nvidia-smi not found, cannot build with CUDA"
exit 1
fi
fi

if [ ! -z ${GG_BUILD_SYCL} ]; then
Expand Down
Loading
Loading