Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@JamesKunstle
Copy link
Contributor

Support for Granite was added in LK v0.5.4, so we can add it as an additional performance option.

@mergify mergify bot added ci-failure dependencies Pull requests that update a dependency file labels Mar 21, 2025
@JamesKunstle JamesKunstle force-pushed the jkunstle/granite-liger-kernel branch 2 times, most recently from 1cd4e63 to 2b1ad16 Compare March 21, 2025 00:17
@mergify mergify bot added ci-failure and removed ci-failure labels Mar 21, 2025
@JamesKunstle JamesKunstle force-pushed the jkunstle/granite-liger-kernel branch 2 times, most recently from e6ee97a to 89aeca5 Compare March 21, 2025 00:36
@mergify mergify bot added ci-failure and removed ci-failure labels Mar 21, 2025
@JamesKunstle
Copy link
Contributor Author

Current CI failure seems unrelated to liger_kernel addition

@RobotSail
Copy link
Member

Need to test but this could be very useful if it works correctly

@JamesKunstle
Copy link
Contributor Author

JamesKunstle commented Mar 22, 2025

Currently correctness test for the kernels themselves are done in-tree in the liger-kernels repo. They assure correct convergence and logit equivalence after training. Link to test here

I've also validated training-dynamics equivalence given identical batches. Link to Jira issue here.

There's a roughly 1% raw improvement with batches being identical. This is expected- the real benefits will come from larger possible batch sizes! This kernel set is also missing the most important kernel: LigerFusedLinearCrossEntropy which skips logit materialization, saving a lot of net memory headroom. We'll add this in the future.

I've given the PSAP team a ref to this PR so they can quantify the improved memory headroom and find new max_batch_lens

Copy link
Member

@RobotSail RobotSail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JamesKunstle Thanks for adding this, a few comments

# Third Party
from liger_kernel.transformers import apply_liger_kernel_to_granite
except ImportError:
apply_liger_kernel_to_granite = lambda *args, **kwargs: None # pylint: disable=C3001
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JamesKunstle We'll want to support other models beyond Granite, and it seems like Liger's API has a number of different functions they expose for common architectures (mistral, llama, etc.).

It seems like they provide a way to map directly from the model_type field on the model into the Liger kernel to apply. Could we please use that here to support more models?

https://github.com/linkedin/Liger-Kernel/blob/293bf7eec7043c8c34b3cd82975c97e4c2f4254f/src/liger_kernel/transformers/monkey_patch.py#L1058C1-L1058C29

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ certainly, only limiting to Granite to get PSAP numbers then will expand the integration to other supported architectures.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JamesKunstle It seems like Liger actually has a AutoLigerKernelForCausalLM class which inherits from AutoModelForCasualLM:
https://github.com/linkedin/Liger-Kernel/blob/293bf7eec7043c8c34b3cd82975c97e4c2f4254f/src/liger_kernel/transformers/auto_model.py#L15

Here we can just make the apply_liger option general and simply attempt wrapping around any model

)

# this will work for granite-3.y models but not granite-7b because that's a Llama 2 model arch.
parser.add_argument("--enable-granite-liger-kernel", action="store_true")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're providing this as an arg through the CLI, we should also expose this in the TrainingArgs config so the SDK can be consistent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ agreed, will do once we've got the Granite numbers from PSAP

@mergify
Copy link
Contributor

mergify bot commented Apr 5, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. @JamesKunstle please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added needs-rebase and removed ci-failure labels Apr 5, 2025
@RobotSail RobotSail force-pushed the jkunstle/granite-liger-kernel branch 2 times, most recently from 9d05a07 to ef82e49 Compare April 5, 2025 15:42
@RobotSail RobotSail closed this Apr 5, 2025
@RobotSail RobotSail force-pushed the jkunstle/granite-liger-kernel branch from ef82e49 to 910e46d Compare April 5, 2025 15:44
@mergify mergify bot added the ci-failure label Apr 5, 2025
@RobotSail
Copy link
Member

Test

@RobotSail
Copy link
Member

Looks like GitHub forces you to make an initial comment before the option to reopen becomes available
#430 (comment)

@RobotSail
Copy link
Member

Never mind, it looks like the reopen and comment button is a lie

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-failure dependencies Pull requests that update a dependency file needs-rebase

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants