the local llm thing

browsing web:
- when the model doesn't fit on the gpu ram, you loose 10x the gpu performance.. making APU with big RAM competing with big RTX3090
- hardware:
  - the models must fit inside the RAM or you loose 10x the performance
  - RAM is even more crucial for reasoning models who needs to remember
  - big usefull model seems to require between 16 to 32 Go RAM, so:
       - a new pc apu + npu with 64Go RAM is the caution
       - if a gpu, the ratio CPU ram = 4x GPU ram seems still nice, so  64gb ram CPU + option a 16gb GPU 
       - unified RAM makes big APU with big bandwith the "cheap" competitive idea in 2025
  - bandwith to RAM is important: 256 bit bus looks minimum, 384 lovely
  - 1T disk may be too small
- evolution pressure:
  - Microsoft "copilote" ==> npu and ram
  - electricity ==> APU seems 10x more efficient then gpu
  - mac-mini style pc the low cost choice, ep if hardware renewal strategy every 2 years
  - token cost is trending -20x per year: 
     - so there is no reason sink too much money on hardware, renew more often, but check minimums
     - apple M4 are the reference: we can only hope from competition from Qualcomm, AMD, Nvidia 

asking to AI (gemini flash 2):
- software:
  - llama.cpp: [ggml-org/llama.cpp](https://github.com/ggml-org/llama.cpp) , greatest but first oriented Mac / Arm
  - ollama: ollama.ai: [ollama.ai ](https://ollama.ai/)
  - LM studio: https://lmstudio.ai/
  - torch / Tensorflow / Jax ? not really for cpu 
  - intel IPEX-LLM for optimizing on cpu  (works below Torch?)
  - ... docker models are looking like a trend 


*** gemini flash 2: 2025-03-16

It's great you're looking into running LLMs locally! Here's a breakdown of how to obtain llama.cpp, Ollama, and LM Studio:

**1. llama.cpp:**

* **GitHub:**
    * The primary source is the llama.cpp GitHub repository:
        * [[ggml-org/llama.cpp](https://github.com/ggml-org/llama.cpp)](https://github.com/ggml-org/llama.cpp)
    * From there, you have a few options:
        * **Building from source:** This offers the most flexibility but requires technical knowledge. You'll need to compile the code according to your operating system.
        * **Downloading pre-built binaries:** On the github release page, you can find prebuilt binaries. Be sure to download the correct version for your CPU architecture.
* **Key points:**
    * llama.cpp is a C++ library, so it's primarily command-line driven.
    * It's highly optimized for CPU inference, especially with quantized models.

**2. Ollama:**

* **Official Website:**
    * The easiest way to get Ollama is from its official website:
        * [[ollama.ai](https://ollama.ai/)](https://ollama.ai/)
    * The website provides straightforward installation instructions for macOS, Linux, and Windows.
    * Installation often involves running a simple installation script.
* **Key points:**
    * Ollama simplifies the process of running LLMs locally.
    * It handles model management, making it user-friendly.

**3. LM Studio:**

* **Official Website:**
    * You can download LM Studio from its official website:
        * [[lmstudio.ai](https://lmstudio.ai/)](https://lmstudio.ai/)
    * The website provides installers for various operating systems.
* **Key points:**
    * LM Studio offers a graphical user interface (GUI), making it very accessible.
    * It allows you to easily download and run LLMs from Hugging Face.
    * It uses llama.cpp in the background.

**General Tips:**

* When downloading or installing, always prioritize official sources to ensure safety.
* Pay attention to system requirements, especially RAM and storage space.
* If you're new to LLMs, Ollama or LM Studio might be easier starting points.

I hope this helps!


simple comparisons:

![Image](https://github.com/user-attachments/assets/0dd138e4-da66-41f5-a7bd-2d3d7c5cbcb5)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

the local llm thing #1505

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

the local llm thing #1505

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions