❓ Questions and Help
What is the correct way to preform a batched add add batched matrix multiply using Cuda in the C++ API?
I found at::baddbmm but could not find its source to verify that it will use Cuda if available or if this is only used on cpu
Thank you for any guidance