Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ggml-cpu: Build variant targeting Neoverse-V2 #14380

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

ckastner
Copy link
Collaborator

As a first improvement on the recently added generic ARM support for GGML_CPU_ALL_VARIANTS, this builds a variant targeting Neoverse-V2 specifically (eg: Graviton4 or NVIDIA Grace).

  • The cmake part needed little change. Feature processing unchanged, but the target is a specific -mcpu= rather than a generic -march=
  • It also defines a GGML_ARM_MCPU passed on to the scoring function
  • The scoring function parses the part number from /proc/cpuinfo on Linux (Graviton4 is Linux-only and I'd guess NVIDIA Grace, too), and uses it in scoring.

In the scoring function, I shifted features to the 9th bit and beyond. The idea being that features are more important than microarchitecture, platform, whatever, which can use bits 2-8 to rank themselves. So nuances like the microarchitecture of two variants become relevant in scoring only if they have otherwise equal features, otherwise features win. I thought this might be a useful convention.

I tested this on Graviton4, where the neoverse-v2 variant indeed received a higher score than the armv8.6-a variant, which would also work for Neoverse-V2 as it is armv8.6-a. neoverse-v2 is also what the GGML_NATIVE=ON build targets.

I did not see meaningful benchmark improvements over generic armv8.6-a, but I tested only limited models, and only with 4 vCPUs. Some tests ran with 2-3% improvement, but this wasn't always reproducible. I hope to get more AWS resources in July where I can properly test this on a dedicated box.

In any case, I think this would at least serve as an easy-to-copy template for other variants where this might matter more.

This supersedes #14332.

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Jun 25, 2025
@slaren
Copy link
Member

slaren commented Jun 30, 2025

I am just wondering if it is worth adding this if it doesn't give any measurable improvement. I don't think it is likely that this will help meaningfully in any case because most of the (performance sensitive) code is already very low level intrinsics and assembly.

@ckastner
Copy link
Collaborator Author

With no improvement, I would tend to agree. Which would be a nice result actually, as that would mean the current ALL_VARIANTS solution targeting a generic ARM arch would already be good enough.

I was just surprised to see no improvement. Clearly nothing big was to be expected, but I did expect at least a marginal one. Hence why I plan to do more tests in July with dedicated hardware. I think it's quite possible that the vCPUs I was using masked any possible gains. 4 vCPUs (not cores), on a 96-core machine with who knows how many co-tenants.

Apart from that, there could be the benefit of having this code as a template for platforms where it might actually matter; minus the whitespace from un-nesting an if, this is still a small change.

(Speaking naively again, the reason I would have expected the compiler to gain at least something from knowing more details about the cores used is that otherwise, what's the point of a Neoverse core over any other. Apart from where direct assembly use prohibits this, of course.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants