Thanks to visit codestin.com
Credit goes to github.com

Skip to content
This repository was archived by the owner on Jul 4, 2025. It is now read-only.
This repository was archived by the owner on Jul 4, 2025. It is now read-only.

Replace cortex.llamacpp with minimalist fork of llama.cpp #1728

@dan-menlo

Description

@dan-menlo

Goal

  • Goal: Can we have a minimalist fork of llama.cpp as llamacpp-engine
    • cortex.cpp's desktop focus means Drogon's features are unused
    • We should contribute our vision and multimodal work upstream as a form of llama.cpp server
    • Very clear Engines abstraction (i.e. support OpenVino etc in the future)
  • Goal: Contribute upwards to llama.cpp
    • Vision, multimodal
    • May not be possible if the vision, audio encoders are Python-runtime based

Can we consider refactoring llamacpp-engine to use the server implementation, and maintain a fork with our improvements to speech, vision etc? This is especially if we do a C++ implementation of whisperVQ in the future.

Potential issues

  • cortex engines llama.cpp update -> updates llama.cpp
    • We still need to build avx-512 variants for janhq/llama.cpp (i.e. build scripts)
    • We should align the janhq/llama.cpp release names with ggml-org/llama.cpp
    • Trigger automatic CI/CD to build
    • We can also ask GG if we can donate compute towards builds
  • Deprecating llava support
  • Handling existing API endpoints for logit_bias, n etc by either upstreaming or in Cortex Server
  • Update Documentation
  • DevRel @ramonpzg
    • Cortex builds on llamacpp-server (and we will contribute in the future)
    • Why do we need to build so many different types of llama.cpp (AVX512, AVX2)
    • GG -> can we contribute Menlo Cloud to llama.cpp project (built up Intel CPUs)

Key Changes

  • Use llama-server instead of Drogon that we use in cortex.llamacpp
  • Use a spawned llama.cpp process instead of dylib (better stablity, parallelism)
    • However, we will effectively need to build a process manager

Metadata

Metadata

Labels

Type

No type

Projects

Status

QA

Status

Done

Relationships

None yet

Development

No branches or pull requests

Issue actions