Conversation
This is currently non-enforcing and unapplied per file until said file receives a major edit.
This'll allow adding supporting files to each model.
Users with 1GB RAM will be able (though ill-advised) to run any model by letting it swap. Users with 16GB RAM will be able to restart the Orpheus model quickly now that the page cache is not evicted every run.
Collaborator
Author
Contributor
|
I failed to leverage mmap to fit Parler mini into memory, as it still asks for 5.3GB somehow. I must be doing something wrong. |
Collaborator
Author
|
@aPaleBlueDot Can you upload your |
Contributor
|
@danielzgtg Here is the file ` ); int n_threads = 4; // 4 threads on iOS bool cpu_only = true; // Force CPU-only mode And it was the 5-bit quantized model. |
danielzgtg
added a commit
to danielzgtg/TTS.cpp
that referenced
this pull request
Aug 19, 2025
mmwillet#105 (comment) Co-authored-by: aPaleBlueDot <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Users with 1GB RAM will be able (though ill-advised) to run any model by letting it swap. Users with 16GB RAM will be able to restart the Orpheus model quickly now that the page cache is not evicted every run.
The mmap code is from llama.cpp. I also deduplicated the model loading code.