-
Notifications
You must be signed in to change notification settings - Fork 11.9k
Feature Request: llama 4 #12774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Still waiting for access. How's the architecture different from llama 3.3? |
details will be on LlamaCon on April 29, guess... @stalkermustang posted on TG https://t.me/seeallochnaya/2496
|
An interesting bit from their blog:
Sounds like that would shink the K/V cache, no? Also, base 256K is crazy by itself. |
unsloth fork here https://huggingface.co/unsloth/Llama-4-Scout-17B-16E-Instruct |
From what I understand from the HF transformers code, the difference between llama 3 vs 4 are:
So overall I think the most complicated part would be to support the chunked attn mask Edit: yes I was missing the attn_temperature_tuning, though I think currently it's not working correctly on transformers |
ml-explore/mlx-lm#74 |
I was thinking maybe we should dynamically load experts, do you think it could work? |
Prerequisites
Feature Description
llama 4 released
tech rep
https://ai.meta.com/blog/llama-4-multimodal-intelligence/
weights
https://www.llama.com/llama4/
https://huggingface.co/collections/meta-llama/llama-4-67f0c30d9fe03840bc9d0164
Motivation
great to be use a multimodal LLM
Possible Implementation
No response
The text was updated successfully, but these errors were encountered: