-
Notifications
You must be signed in to change notification settings - Fork 26
Description
Hi, I have a question about the proper usage of nvbit_get_func_addr() function. My goal is to get the address of a device/global function inside a cubin file, which means it should be fixed if I don't recompile my files (according to my understanding). I see the comment for nvbit_get_func_addr() is "Allows to get PC address of the function", which I thought is what I needed, but it seems not.
I added nvbit_get_func_addr() inside instr_count to output the addresses, and used the vecAdd example provided in the repo. If I run it three times back-to-back, it will output three different addresses (not fixed), and they are all much larger than the size of cubin file (4.0KB). Therefore, I think I might misunderstand the usage of nvbit_get_func_addr().
inspecting vecAdd(double*, double*, double*, int) at 0x00007fc9af25f400
inspecting vecAdd(double*, double*, double*, int) at 0x00007f6c2f25f400
inspecting vecAdd(double*, double*, double*, int) at 0x00007fb5d725f400
Am I using nvbit_get_func_addr() wrong? And if yes, in order to get the address of a device/global function inside a cubin file, what would be a good method with NVBit? Any suggestion or insight is welcome!
Thank you so much for your help!
Setup
GPU: A100
Driver Version: 555.42.02
CUDA Version: 12.5
Compute Capability: 8.0