Thanks to visit codestin.com
Credit goes to github.com

Skip to content

the local llm thing #1505

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
stonebig opened this issue Mar 16, 2025 · 0 comments
Open

the local llm thing #1505

stonebig opened this issue Mar 16, 2025 · 0 comments

Comments

@stonebig
Copy link
Contributor

stonebig commented Mar 16, 2025

browsing web:

  • when the model doesn't fit on the gpu ram, you loose 10x the gpu performance.. making APU with big RAM competing with big RTX3090
  • hardware:
    • the models must fit inside the RAM or you loose 10x the performance
    • RAM is even more crucial for reasoning models who needs to remember
    • big usefull model seems to require between 16 to 32 Go RAM, so:
      • a new pc apu + npu with 64Go RAM is the caution
      • if a gpu, the ratio CPU ram = 4x GPU ram seems still nice, so 64gb ram CPU + option a 16gb GPU
      • unified RAM makes big APU with big bandwith the "cheap" competitive idea in 2025
    • bandwith to RAM is important: 256 bit bus looks minimum, 384 lovely
    • 1T disk may be too small
  • evolution pressure:
    • Microsoft "copilote" ==> npu and ram
    • electricity ==> APU seems 10x more efficient then gpu
    • mac-mini style pc the low cost choice, ep if hardware renewal strategy every 2 years
    • token cost is trending -20x per year:
      • so there is no reason sink too much money on hardware, renew more often, but check minimums
      • apple M4 are the reference: we can only hope from competition from Qualcomm, AMD, Nvidia

asking to AI (gemini flash 2):

  • software:
    • llama.cpp: ggml-org/llama.cpp , greatest but first oriented Mac / Arm
    • ollama: ollama.ai: ollama.ai
    • LM studio: https://lmstudio.ai/
    • torch / Tensorflow / Jax ? not really for cpu
    • intel IPEX-LLM for optimizing on cpu (works below Torch?)
    • ... docker models are looking like a trend

*** gemini flash 2: 2025-03-16

It's great you're looking into running LLMs locally! Here's a breakdown of how to obtain llama.cpp, Ollama, and LM Studio:

1. llama.cpp:

  • GitHub:
    • The primary source is the llama.cpp GitHub repository:
    • From there, you have a few options:
      • Building from source: This offers the most flexibility but requires technical knowledge. You'll need to compile the code according to your operating system.
      • Downloading pre-built binaries: On the github release page, you can find prebuilt binaries. Be sure to download the correct version for your CPU architecture.
  • Key points:
    • llama.cpp is a C++ library, so it's primarily command-line driven.
    • It's highly optimized for CPU inference, especially with quantized models.

2. Ollama:

  • Official Website:
    • The easiest way to get Ollama is from its official website:
    • The website provides straightforward installation instructions for macOS, Linux, and Windows.
    • Installation often involves running a simple installation script.
  • Key points:
    • Ollama simplifies the process of running LLMs locally.
    • It handles model management, making it user-friendly.

3. LM Studio:

  • Official Website:
  • Key points:
    • LM Studio offers a graphical user interface (GUI), making it very accessible.
    • It allows you to easily download and run LLMs from Hugging Face.
    • It uses llama.cpp in the background.

General Tips:

  • When downloading or installing, always prioritize official sources to ensure safety.
  • Pay attention to system requirements, especially RAM and storage space.
  • If you're new to LLMs, Ollama or LM Studio might be easier starting points.

I hope this helps!

simple comparisons:

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant