-
Notifications
You must be signed in to change notification settings - Fork 12.1k
Implement GGML_CPU_ALL_VARIANTS for ARM #14080
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
82b2f82
to
0a9ae27
Compare
Regarding the test failures:
|
We are going to need a different set of variants for each platform. For example I would expect the variants used for Apple to be each of the M1-M4 chips (or at least the ones that have different CPU features). The Android variants are also likely to be different than the Linux variants. Windows at the moment probably only needs one variant. It's not necessary to support every platform from the first moment, but this list of variants probably should only be used for Linux. Generally I think it is easier to build the list of variants if we know exactly the list of chips we are targeting. |
I agree, I'll add that later today.
Same, though I saw this as a topic for future iterations once the basic mechanism is in place. Might also be an opportunity for the respective manufacturers to chip in (pun intended). |
I got build error on my m4pro with -DGGML_METAL=OFF -DGGML_BLAS=OFF -DBUILD_SHARED_LIBS=ON -DGGML_OPENMP=OFF -DGGML_CPU_ALL_VARIANTS=ON -DGGML_BACKEND_DL=ON -DGGML_NATIVE=OFF:
So it needs a different set of variants for each platform. |
I limited GGML_CPU_ALL_VARIANTS to Linux now. I also figured out why some tests were failing. This was a mistake on my part, I put some of the variant-building part in an else-branch, rather than an elseif-branch. The previous behavior has been restored. |
This is analogous to cpu-feats-x86.cpp. However, to detect compile-time activation of features, we rely on GGML_USE_<FEAT> which need to be set in cmake, instead of GGML_<FEAT> that users would set for x86. This is because on ARM, users specify features with GGML_CPU_ARM_ARCH, rather than with individual flags.
Like x86, however to pass around arch flags within cmake, we use GGML_INTERNAL_<FEAT> as we don't have GGML_<FEAT>. Some features are optional, so we may need to build multiple backends per arch version (armv8.2_1, armv8.2_2, ...), and let the scoring function sort out which one can be used.
The other platforms will need their own specific variants. This also fixes the bug that the the variant-building branch was always being executed as the else-branch of GGML_NATIVE=OFF. The branch is moved to an elseif-branch which restores the previous behavior.
d79bfe9
to
f5a8b9a
Compare
Rebased onto current master. |
Benchmarked on Graviton3: W/O GGML_CPU_ALL_VARIANTS
Not sure why the number for threads 16 were off for W/ GGML_CPU_ALL_VARIANTS, otherwise it looks good. Looking forward adding the support for Apple and Android. |
Thanks for the check! Are you sure the first example is with I tested it with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a linux arm machine to test this, but the code looks good to me. The only thing I don't like very much is how the base arm arch is set automatically depending on which features are added, it is not obvious to me that this will always be correct. But it can changed if necessary when adding support for other platforms.
Yeah this definitely has room for future improvement. Regarding correctness, I assumed that if some instruction was only added in arch version X, then it is safe to bump the base arch to X and possibly get other improvements. @chaxu01, your thoughts on that? |
In my case, I check for compiler support for arch X for each feature and add the supported variant.
Otherwise one could get into the build failed issue in case the compiler doesn't support the specific feature. |
Ah, the leftover library from the previous build got picked up. Rerun after cleaning up:
|
A fair point. It's even a bit more complicated I think, for example the support check is the same that is done for GGML_NATIVE=ON so that could be factored out. Like Then there's a choice where to check and skip the build if not supported: before or in the _variant function). Before is cleaner I think, but it also feels a bit like a layer violation given the current setup. @slaren this new question seems nuanced enough to warrant its own PR, is that ok or would you like that addressed here? |
Should be ok to merge this now, other improvements can be added in a later PR. Once everything is ironed out we can enable it in the docker arm releases, and re-enable the linux arm releases. |
@chaxu01, I'll step back now from the ARM part so you can add your queued work without us crossing wires. |
This supersedes #14049 which also has more context.
There are two notable design decisions, better explained in the respective commit messages:
I tested this on a 4-vcpu Graviton4 which is armv9.0-a. Test command was simply
llama-bench -m ggml-model-q4_0.gguf
, as I just needed something simple to show that loading worked correctly and that no regressions were introduced.First, the results with
GGML_NATIVE=ON
:Then, the results of
GGML_NATIVE=OFF GGML_BACKEND_DL=ON GGML_CPU_ALL_VARIANTS=ON
(some debug messages omitted):The scoring for each backend was calculated correctly.
The armv9.2 backends are for SME which the Graviton4 doesn't have, hence they scored 0. So the armv8.6_2 backend (
+dotprod+fp16+sve+i8mm
) got picked.Incidentally, this showcases one problem that I left for future work. When choosing the MCPU to target, I chose the first version that supported a particular instruction, which eg: for
i8mm
was armv8.6-a. However, the test above was run on armv9.0-a, so a build with same features but targeting armv9.0-a might have performed even better. The solution would be to include the runtime arch in the scoring, but the above implements the necessary base case and I'll look to this improvement later.