Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Add support for Arcee AI's upcoming AFM model #14185

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jun 15, 2025

Conversation

bartowski1182
Copy link
Contributor

@bartowski1182 bartowski1182 commented Jun 14, 2025

This adds support for upcoming Arcee model architecture, currently codenamed the Arcee Foundation Model (AFM).

Uses ReLU² (ReLU-squared) activation in the MLP blocks

Have tested performance of quantized model, seems to perform as expected, but keeping this a draft until it can be lightly reviewed and we confirm it's accurate

Transformers PR reference: huggingface/transformers#38621

@github-actions github-actions bot added the python python script changes label Jun 14, 2025
Copy link
Collaborator

@ngxson ngxson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

NULL, NULL, NULL,
model.layers[il].ffn_down, NULL, NULL,
NULL,
LLM_FFN_RELU_SQR, LLM_FFN_SEQ, il);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the only different from AFM and llama is only this activation function.

Not sure if in the future, we can abstract out this activation definition per-model (maybe as a hparam or a variable inside struct llm_build_llama?) to avoid too much duplicated code. WDYT @ggerganov ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also lacks the FFN gate, but maybe could also be abstracted?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the gate is not present, its value will be nullptr, and build_ffn will skip the nullptr value, so no further modification is required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah right that makes sense ! yeah definitely seems worth considering some extra abstraction here then

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw I'm just bring this up for further discussion. Feel free to merge the current PR without that

Co-authored-by: Xuan-Son Nguyen <[email protected]>
@ngxson ngxson marked this pull request as ready for review June 15, 2025 22:14
@ngxson ngxson marked this pull request as draft June 15, 2025 22:14
@bartowski1182 bartowski1182 marked this pull request as ready for review June 15, 2025 22:20
Copy link
Collaborator

@ngxson ngxson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lmk when you're ready to merge this

@bartowski1182
Copy link
Contributor Author

Ready!

@ngxson ngxson merged commit d7da8dc into ggml-org:master Jun 15, 2025
50 checks passed
@@ -128,6 +128,7 @@ class TOKENIZER_TYPE(IntEnum):
{"name": "llama4", "tokt": TOKENIZER_TYPE.BPE, "repo": "https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct", },
{"name": "pixtral", "tokt": TOKENIZER_TYPE.BPE, "repo": "https://huggingface.co/mistral-community/pixtral-12b", },
{"name": "seed-coder", "tokt": TOKENIZER_TYPE.BPE, "repo": "https://huggingface.co/ByteDance-Seed/Seed-Coder-8B-Base", },
{"name": "arcee", "tokt": TOKENIZER_TYPE.BPE, "repo": "https://huggingface.co/arcee-ai/AFM-4.5B", }, # TODO confirm final URL
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either this shouldn't have been added, or you forgot to add the new hash.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in #14207

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python python script changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants