-
Notifications
You must be signed in to change notification settings - Fork 11.6k
ggml : add DirectML backend #7772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Great idea, it looks like a lot of the upcoming AI hardware is going to have NPUs. |
I am not convinced that a DirectML backend is possible, the operators are too high level and new ones cannot be added. This means that we cannot implement a matrix multiplication operator that supports our quant formats. It might be possible to do it with DirectX 12 shaders, but at that point it would be a DirectX 12 backend more than a DirectML backend. It would not allow using onnx models regardless. |
Would using DirectX 12 shaders allow us to run stuff on the NPU? I suppose no, but just making sure. The main point of a potential DirectML backend would be to utilize the NPU. If it is too high-level (i.e. something like what CoreML is on Apple Silicon) then I agree it's not worth or possible to add support for it |
I am not sure it is possible to create custom NPU kernels at all. https://github.com/openvinotoolkit/npu_plugin seems to contain a compiler for the Intel NPU, but it's not clear if it is complete, and they have removed the source of the kernels that should be located in https://github.com/openvinotoolkit/npu_plugin/tree/develop/sw_runtime_kernels, leaving only the binary blobs. |
Interesting, but they have a PyTorch implementation, I thought PyTorch is pretty diverse in what it can support, but I have not much insight into the base here. Or is PyTorch automatically supplementing what isn't supported by DirectML with CPU? |
it should automatically fallback to CPU in this case. |
@slaren the lower level D3D metacommands interface leveraged by DirectML is not publicly documented. The Intel NPU d3d12 drivers have a shader compiler and accept custom kernels. But the DirectML driver for the NPU on Qualcomm systems is metacommands only, w/ no custom kernel support, at least so far. |
@woachk thanks, that's useful information. If the Intel NPU driver accepts custom kernels via d3d12 shaders, I expect that it would be possible to fully support it through a d3d12 backend. For NPUs that only support DirectML, it may still be possible to support fp16 and fp32 models. It may also be possible to create a backend that transparently quantizes the tensors to an internal format, and in this way it may be possible to support the quantization type of DirectML, although that would be a significant deviation from the current backends. Personally I don't think that a backend that cannot use the ggml quantization types would be very useful. |
While it might be helpful, Torch (and in extension PyTorch) is not working on all platforms. Torch does not support ARM (or RISC-V) architectures and also lacks support for NPUs. |
@ggerganov So I went and forked the project and have been iterating to the point where I have a basic understanding of the layout, such that I've made stubs and can compile/link and run a simple unit test against my ggml-backend-dx12.cpp. Can you explicate post-conditions and offer some guidance on what the most important features for me to focus on are? I'm a senior in a C.S. program, but I'm not taking any classes right now because we just had a baby, so I'd like to focus on this contribution to learn more and be a useful member of the community. EDIT: Found this https://github.com/Const-me/Whisper and am going to leverage all of this hard work since DX11 and DX12 are super close. |
It seems like DirectML supports the upcoming NPU-enabled chips for Windows machines:
https://devblogs.microsoft.com/directx/introducing-neural-processor-unit-npu-support-in-directml-developer-preview/
I don't think there is any other way to tap into this hardware, so we should explore if it possible to add this library as a backend in
ggml
in order to run stuff on the NPUs. There has been some semi-related work in the past that combinedggml
and Direct3D: https://github.com/Const-me/Whisper. Not sure if it is relevant at all, maybe just as an inspirationThe text was updated successfully, but these errors were encountered: