Thanks to visit codestin.com
Credit goes to github.com

Skip to content

feat: mmap#105

Merged
danielzgtg merged 8 commits intommwillet:mainfrom
danielzgtg:feat/mmap
Aug 16, 2025
Merged

feat: mmap#105
danielzgtg merged 8 commits intommwillet:mainfrom
danielzgtg:feat/mmap

Conversation

@danielzgtg
Copy link
Collaborator

Users with 1GB RAM will be able (though ill-advised) to run any model by letting it swap. Users with 16GB RAM will be able to restart the Orpheus model quickly now that the page cache is not evicted every run.

The mmap code is from llama.cpp. I also deduplicated the model loading code.

This is currently non-enforcing and unapplied per file until said file
receives a major edit.
This'll allow adding supporting files to each model.
Users with 1GB RAM will be able (though ill-advised) to run any model by
letting it swap. Users with 16GB RAM will be able to restart the Orpheus
model quickly now that the page cache is not evicted every run.
@danielzgtg
Copy link
Collaborator Author

Merging to see if this helps #103 and #108

@danielzgtg danielzgtg merged commit a23f65c into mmwillet:main Aug 16, 2025
2 checks passed
@aPaleBlueDot
Copy link
Contributor

I failed to leverage mmap to fit Parler mini into memory, as it still asks for 5.3GB somehow. I must be doing something wrong.

@danielzgtg
Copy link
Collaborator Author

@aPaleBlueDot Can you upload your cmake-build-release/CMakeCache.txt file and post the arguments you are invoking cmake-build-release/bin/tts-cli with?

@aPaleBlueDot
Copy link
Contributor

aPaleBlueDot commented Aug 19, 2025

@danielzgtg Here is the file
CMakeCache.txt
and here are the parameters (w/ added line breaks for readability):

`
generation_configuration config(

  "",      // voice (empty)

  30,      // top_k (reduced from 50)

  1.0f,    // temperature

  1.1f,    // repetition_penalty

  false,   // use_cross_attn (disabled to save memory)

  "",      // espeak_voice_id (empty)

  256,     // max_tokens (reduced from 512)

  0.95f,   // top_p

  true     // sample

);

int n_threads = 4; // 4 threads on iOS

bool cpu_only = true; // Force CPU-only mode
`

And it was the 5-bit quantized model.

danielzgtg added a commit to danielzgtg/TTS.cpp that referenced this pull request Aug 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments