Codestin Search App

cebtenzzre · 2023-07-18T23:00:11Z

This is an implementation of YaRN RoPE scaling. See https://github.com/jquesnelle/yarn and the paper and errata.

TODO:

Add new GGUF key for how much context the base model was trained on
Support converting the new models to GGUF
Add backward implementations
Test new LLaMA implementation
Finish and test Falcon implementation

FNsi · 2023-07-20T12:45:02Z

Any guide to set para extrapolation and ntk? How do they work with previous two paras?

cebtenzzre · 2023-07-20T21:33:53Z

The upstream NTKv2 doesn't use --rope-freq-base, so it probably doesn't make sense to use it. It does use --rope-freq-scale, which works like linear scaling, and is supposed to be calibrated so that e.g. .25 scale actually gives you 8192 context. To use the default NTKv2, you should set --rope-ntk-factor and --rope-extrapolation-factor to 1, and set --rope-freq-scale appropriately. The lower the factors are, the less the respective scaling methods are mixed in, although I believe the graphs have been generated with both at 100% - the code automatically ramps them based on some experimentally determined thresholds.

cebtenzzre · 2023-07-21T22:14:54Z

I would appreciate help with the following:

Should I try to write a backwards implementation? NTKv1 still doesn't have one, so I don't have much to base it on.
I don't have a Mac to test the Metal code on. If anyone sees obvious flaws or can test it locally, let me know.
I'm going to try to run a perplexity benchmark against NTKv1 and linear scaling, but I don't know if my current hardware is up to the task.

ggerganov

Rename everywhere extrapolation_factor to ext_factor

ggerganov · 2023-07-31T08:47:40Z

No need for backwards implementation for now

cebtenzzre · 2023-08-14T19:35:52Z

Perplexity with NTKv2 may be worse because neither is the dynamic version, which AFAIK works better on non-finetuned models. But fine-tuned models are far superior anyway.

NTKv1 does not converge when fine-tuning, which is why NTKv2 exists. So until somebody publishes a model fine-tuned with NTKv2—maybe LLongMAv2 will be released after jquesnelle publishes the paper based on scaled-rope—the existing LLongMA, which uses regular linear interpolation (just like SuperHOT), is the state-of-the-art for long contexts.

cebtenzzre · 2023-08-31T20:55:28Z

The paper has been released. The resulting method is called YaRN. Apparently the models that use this technique are good to about 120k tokens of context.

More work will definitely be needed to use these models with llama.cpp.

bloc97 · 2023-09-06T06:11:56Z

Thank you for the llamacpp implementation of YaRN!

I'm just letting you know that

constant float max_pos_emb = 2048;

should be changed to 4096 for llama 2 models when using YaRN (default was 2048 because we did the most tests with llama 1 models)
This value should probably be saved inside of the model configs and be loaded on inference...

cebtenzzre · 2023-09-06T06:40:59Z

should be changed to 4096 for llama 2 models

Thanks for reminding me. I originally made this PR before GGUF was finished, so I hardcoded it in the meantime. I believe I can now use the value of llama.context_length for this purpose.

KerfuffleV2 · 2023-09-06T08:20:25Z

Would it be worth testing this with non-YaRN fine-tuned models? If so, any suggested settings? I can test it with ROCM.

Green-Sky · 2023-09-06T11:43:35Z

Thank you for the llamacpp implementation of YaRN!

I'm just letting you know that
constant float max_pos_emb = 2048;
should be changed to 4096 for llama 2 models when using YaRN (default was 2048 because we did the most tests with llama 1 models) This value should probably be saved inside of the model configs and be loaded on inference...

this needs to be a new GGUF kv, something like "rope_yarn_orig_ctx"

Thanks for reminding me. I originally made this PR before GGUF was finished, so I hardcoded it in the meantime. I believe I can now use the value of llama.context_length for this purpose.

llama.context_length should be the size of the finetune. eg 128Ki

bloc97 · 2023-09-06T19:55:27Z

this needs to be a new GGUF kv, something like "rope_yarn_orig_ctx"

Exactly, after finetuning a model with YaRN, we have to keep track of two values, one being the original context length (2048 for LLaMA or 4096 for Llama 2), and also the final context length (which can be calculated by multipling the original ctx length by the scale factor, eg. 4096 x 32 = 128Ki)

In this case, the constant constant float max_pos_emb = 2048; used in the equations must be equal to the original context size, not the final context size.

Co-authored-by: cebtenzzre <[email protected]> Co-authored-by: Jeffrey Quesnelle <[email protected]>

…g#3893)

…#3898)

* fix backward process of rope rope backward process was broken after YaRN RoPE (ggml-org#2268) implementation, due to missing changes in backward functions. the code for the backward process is nearly identically to the forward process: the only difference is the sign of the sin-values. to avoid future regressions remove the near-duplicate backward functions and reuse the forward code: for this a new function argument `bool forward` was added to `ggml_compute_forward_rope_f32` and `ggml_compute_forward_rope_f16`. the sin-values will be negated when forward is false. * fix finetune rope call to use correct default attn_factor of 1.0f * remove unused `ggml_rope_xpos_back` it is better to have only one `ggml_rope_back` function that accepts all rope parameters, so that `ggml_compute_backward` can propagate all parameters without having to switch between different rope_back variants. * fix comments explaining the sinus sign in ggml_forward_rope * add missing function arguments in declaration * fix function argument type in declaration

Co-authored-by: cebtenzzre <[email protected]> Co-authored-by: Jeffrey Quesnelle <[email protected]>

…g#3893)

…#3898)

* fix backward process of rope rope backward process was broken after YaRN RoPE (ggml-org#2268) implementation, due to missing changes in backward functions. the code for the backward process is nearly identically to the forward process: the only difference is the sign of the sin-values. to avoid future regressions remove the near-duplicate backward functions and reuse the forward code: for this a new function argument `bool forward` was added to `ggml_compute_forward_rope_f32` and `ggml_compute_forward_rope_f16`. the sin-values will be negated when forward is false. * fix finetune rope call to use correct default attn_factor of 1.0f * remove unused `ggml_rope_xpos_back` it is better to have only one `ggml_rope_back` function that accepts all rope parameters, so that `ggml_compute_backward` can propagate all parameters without having to switch between different rope_back variants. * fix comments explaining the sinus sign in ggml_forward_rope * add missing function arguments in declaration * fix function argument type in declaration

Co-authored-by: cebtenzzre <[email protected]> Co-authored-by: Jeffrey Quesnelle <[email protected]>

…g#3893)

…#3898)

* fix backward process of rope rope backward process was broken after YaRN RoPE (ggml-org#2268) implementation, due to missing changes in backward functions. the code for the backward process is nearly identically to the forward process: the only difference is the sign of the sin-values. to avoid future regressions remove the near-duplicate backward functions and reuse the forward code: for this a new function argument `bool forward` was added to `ggml_compute_forward_rope_f32` and `ggml_compute_forward_rope_f16`. the sin-values will be negated when forward is false. * fix finetune rope call to use correct default attn_factor of 1.0f * remove unused `ggml_rope_xpos_back` it is better to have only one `ggml_rope_back` function that accepts all rope parameters, so that `ggml_compute_backward` can propagate all parameters without having to switch between different rope_back variants. * fix comments explaining the sinus sign in ggml_forward_rope * add missing function arguments in declaration * fix function argument type in declaration

Co-authored-by: cebtenzzre <[email protected]> Co-authored-by: Jeffrey Quesnelle <[email protected]>

…g#3893)

…#3898)

* fix backward process of rope rope backward process was broken after YaRN RoPE (ggml-org#2268) implementation, due to missing changes in backward functions. the code for the backward process is nearly identically to the forward process: the only difference is the sign of the sin-values. to avoid future regressions remove the near-duplicate backward functions and reuse the forward code: for this a new function argument `bool forward` was added to `ggml_compute_forward_rope_f32` and `ggml_compute_forward_rope_f16`. the sin-values will be negated when forward is false. * fix finetune rope call to use correct default attn_factor of 1.0f * remove unused `ggml_rope_xpos_back` it is better to have only one `ggml_rope_back` function that accepts all rope parameters, so that `ggml_compute_backward` can propagate all parameters without having to switch between different rope_back variants. * fix comments explaining the sinus sign in ggml_forward_rope * add missing function arguments in declaration * fix function argument type in declaration

Co-authored-by: cebtenzzre <[email protected]> Co-authored-by: Jeffrey Quesnelle <[email protected]>

…g#3893)

…#3898)

* fix backward process of rope rope backward process was broken after YaRN RoPE (ggml-org#2268) implementation, due to missing changes in backward functions. the code for the backward process is nearly identically to the forward process: the only difference is the sign of the sin-values. to avoid future regressions remove the near-duplicate backward functions and reuse the forward code: for this a new function argument `bool forward` was added to `ggml_compute_forward_rope_f32` and `ggml_compute_forward_rope_f16`. the sin-values will be negated when forward is false. * fix finetune rope call to use correct default attn_factor of 1.0f * remove unused `ggml_rope_xpos_back` it is better to have only one `ggml_rope_back` function that accepts all rope parameters, so that `ggml_compute_backward` can propagate all parameters without having to switch between different rope_back variants. * fix comments explaining the sinus sign in ggml_forward_rope * add missing function arguments in declaration * fix function argument type in declaration

Co-authored-by: cebtenzzre <[email protected]> Co-authored-by: Jeffrey Quesnelle <[email protected]>

…g#3893)

…#3898)

* fix backward process of rope rope backward process was broken after YaRN RoPE (ggml-org#2268) implementation, due to missing changes in backward functions. the code for the backward process is nearly identically to the forward process: the only difference is the sign of the sin-values. to avoid future regressions remove the near-duplicate backward functions and reuse the forward code: for this a new function argument `bool forward` was added to `ggml_compute_forward_rope_f32` and `ggml_compute_forward_rope_f16`. the sin-values will be negated when forward is false. * fix finetune rope call to use correct default attn_factor of 1.0f * remove unused `ggml_rope_xpos_back` it is better to have only one `ggml_rope_back` function that accepts all rope parameters, so that `ggml_compute_backward` can propagate all parameters without having to switch between different rope_back variants. * fix comments explaining the sinus sign in ggml_forward_rope * add missing function arguments in declaration * fix function argument type in declaration

cebtenzzre force-pushed the ntkv2 branch 3 times, most recently from ce59171 to f3b9eae Compare July 19, 2023 03:55

cebtenzzre changed the title ~~llama: implement NTK-By-Parts (NTKv2)~~ llama: implement NTK-By-Parts (NTKv2) RoPE scaling Jul 19, 2023

cebtenzzre force-pushed the ntkv2 branch from f3b9eae to f30c571 Compare July 19, 2023 04:31

cebtenzzre force-pushed the ntkv2 branch from c62b01b to fe2413c Compare July 21, 2023 22:03

cebtenzzre marked this pull request as ready for review July 21, 2023 22:04

ggerganov reviewed Jul 22, 2023

View reviewed changes

Comment thread ggml.c Outdated

cebtenzzre added 3 commits August 7, 2023 12:16

llama: implement NTK-By-Parts (NTKv2) RoPE scaling

8dec38c

CUDA implementation

6aeb46b

Metal implementation

9348aa4

cebtenzzre force-pushed the ntkv2 branch from 2b61001 to 9348aa4 Compare August 7, 2023 16:18

This comment was marked as outdated.

Sign in to view

cebtenzzre mentioned this pull request Sep 4, 2023

Discussion: how to apply this experiment to the llama2 70B model? jquesnelle/yarn#11

Open

implement new YaRN algorithm

a30ae20

cebtenzzre changed the title ~~llama: implement NTK-By-Parts (NTKv2) RoPE scaling~~ llama: implement YaRN RoPE scaling Sep 5, 2023

This comment was marked as resolved.

Sign in to view

cebtenzzre marked this pull request as draft September 6, 2023 15:50

cebtenzzre mentioned this pull request Sep 18, 2023

llama : allow gguf RoPE keys to be overridden with defaults #3240

Merged

Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026

llama : implement YaRN RoPE scaling (ggml-org#2268)

7cd224a

Co-authored-by: cebtenzzre <[email protected]> Co-authored-by: Jeffrey Quesnelle <[email protected]>

Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026

llama : fix llama_context_default_params after ggml-org#2268 (ggml-or…

0f352c9

…g#3893)

Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026

cuda : fix RoPE after ggml-org#2268 (ggml-org#3897)

ea5d6f1

Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026

metal : fix build errors and kernel sig after ggml-org#2268 (ggml-org…

5303045

…#3898)

phuongncn pushed a commit to phuongncn/llama.cpp-gx10-dgx-sparks-deepseekv4 that referenced this pull request Apr 28, 2026

llama : implement YaRN RoPE scaling (ggml-org#2268)

2bbcc7a

Co-authored-by: cebtenzzre <[email protected]> Co-authored-by: Jeffrey Quesnelle <[email protected]>

phuongncn pushed a commit to phuongncn/llama.cpp-gx10-dgx-sparks-deepseekv4 that referenced this pull request Apr 28, 2026

llama : fix llama_context_default_params after ggml-org#2268 (ggml-or…

db24ce7

…g#3893)

phuongncn pushed a commit to phuongncn/llama.cpp-gx10-dgx-sparks-deepseekv4 that referenced this pull request Apr 28, 2026

cuda : fix RoPE after ggml-org#2268 (ggml-org#3897)

13f4dd6

phuongncn pushed a commit to phuongncn/llama.cpp-gx10-dgx-sparks-deepseekv4 that referenced this pull request Apr 28, 2026

metal : fix build errors and kernel sig after ggml-org#2268 (ggml-org…

3b5af5b

…#3898)

ljubomirj pushed a commit to ljubomirj/llama.cpp that referenced this pull request May 6, 2026

llama : implement YaRN RoPE scaling (ggml-org#2268)

205aba4

Co-authored-by: cebtenzzre <[email protected]> Co-authored-by: Jeffrey Quesnelle <[email protected]>

ljubomirj pushed a commit to ljubomirj/llama.cpp that referenced this pull request May 6, 2026

llama : fix llama_context_default_params after ggml-org#2268 (ggml-or…

84e61cc

…g#3893)

ljubomirj pushed a commit to ljubomirj/llama.cpp that referenced this pull request May 6, 2026

cuda : fix RoPE after ggml-org#2268 (ggml-org#3897)

ab63bcb

ljubomirj pushed a commit to ljubomirj/llama.cpp that referenced this pull request May 6, 2026

metal : fix build errors and kernel sig after ggml-org#2268 (ggml-org…

8ed917e

…#3898)

my-other-github-account pushed a commit to my-other-github-account/llama.cpp that referenced this pull request May 15, 2026

llama : implement YaRN RoPE scaling (ggml-org#2268)

b7ee209

Co-authored-by: cebtenzzre <[email protected]> Co-authored-by: Jeffrey Quesnelle <[email protected]>

my-other-github-account pushed a commit to my-other-github-account/llama.cpp that referenced this pull request May 15, 2026

llama : fix llama_context_default_params after ggml-org#2268 (ggml-or…

e8fdc67

…g#3893)

my-other-github-account pushed a commit to my-other-github-account/llama.cpp that referenced this pull request May 15, 2026

cuda : fix RoPE after ggml-org#2268 (ggml-org#3897)

7d9bd13

my-other-github-account pushed a commit to my-other-github-account/llama.cpp that referenced this pull request May 15, 2026

metal : fix build errors and kernel sig after ggml-org#2268 (ggml-org…

150a197

…#3898)

AlexiAlp pushed a commit to minghaop/llama.cpp that referenced this pull request Jun 2, 2026

llama : implement YaRN RoPE scaling (ggml-org#2268)

ff111f7

Co-authored-by: cebtenzzre <[email protected]> Co-authored-by: Jeffrey Quesnelle <[email protected]>

AlexiAlp pushed a commit to minghaop/llama.cpp that referenced this pull request Jun 2, 2026

llama : fix llama_context_default_params after ggml-org#2268 (ggml-or…

fe1accf

…g#3893)

AlexiAlp pushed a commit to minghaop/llama.cpp that referenced this pull request Jun 2, 2026

cuda : fix RoPE after ggml-org#2268 (ggml-org#3897)

805bf79

AlexiAlp pushed a commit to minghaop/llama.cpp that referenced this pull request Jun 2, 2026

metal : fix build errors and kernel sig after ggml-org#2268 (ggml-org…

7f10c4c

…#3898)

AlexiAlp pushed a commit to minghaop/llama.cpp that referenced this pull request Jun 2, 2026

llama : implement YaRN RoPE scaling (ggml-org#2268)

2de366f

Co-authored-by: cebtenzzre <[email protected]> Co-authored-by: Jeffrey Quesnelle <[email protected]>

AlexiAlp pushed a commit to minghaop/llama.cpp that referenced this pull request Jun 2, 2026

llama : fix llama_context_default_params after ggml-org#2268 (ggml-or…

978d326

…g#3893)

AlexiAlp pushed a commit to minghaop/llama.cpp that referenced this pull request Jun 2, 2026

cuda : fix RoPE after ggml-org#2268 (ggml-org#3897)

796e816

AlexiAlp pushed a commit to minghaop/llama.cpp that referenced this pull request Jun 2, 2026

metal : fix build errors and kernel sig after ggml-org#2268 (ggml-org…

c3c3a1d

…#3898)

Conversation

cebtenzzre commented Jul 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FNsi commented Jul 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cebtenzzre commented Jul 20, 2023

Uh oh!

cebtenzzre commented Jul 21, 2023

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ggerganov commented Jul 31, 2023

Uh oh!

This comment was marked as outdated.

cebtenzzre commented Aug 14, 2023

Uh oh!

cebtenzzre commented Aug 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as resolved.

bloc97 commented Sep 6, 2023

Uh oh!

cebtenzzre commented Sep 6, 2023

Uh oh!

KerfuffleV2 commented Sep 6, 2023

Uh oh!

Green-Sky commented Sep 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bloc97 commented Sep 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

13 participants

cebtenzzre commented Jul 18, 2023 •

edited

Loading

FNsi commented Jul 20, 2023 •

edited

Loading

cebtenzzre commented Aug 31, 2023 •

edited

Loading

Green-Sky commented Sep 6, 2023 •

edited

Loading

bloc97 commented Sep 6, 2023 •

edited

Loading