-
Notifications
You must be signed in to change notification settings - Fork 206
Add dynamic plugin along with counter events, cupti disable support #1148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add dynamic plugin along with counter events, cupti disable support #1148
Conversation
|
Hi @briancoutinho! Thank you for your pull request. We require contributors to sign our Contributor License Agreement, and yours needs attention. You currently have a record in our system, but the CLA is no longer valid, and will need to be resubmitted. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at [email protected]. Thanks! |
| enum KinetoPlugin_ProfileEventType { | ||
| KINETO_PLUGIN_PROFILE_EVENT_TYPE_INVALID = 0, | ||
| KINETO_PLUGIN_PROFILE_EVENT_TYPE_CPU_OP, // cpu side ops | ||
| KINETO_PLUGIN_PROFILE_EVENT_TYPE_USER_ANNOTATION, | ||
| KINETO_PLUGIN_PROFILE_EVENT_TYPE_GPU_USER_ANNOTATION, | ||
| KINETO_PLUGIN_PROFILE_EVENT_TYPE_GPU_MEMCPY, | ||
| KINETO_PLUGIN_PROFILE_EVENT_TYPE_GPU_MEMSET, | ||
| KINETO_PLUGIN_PROFILE_EVENT_TYPE_CONCURRENT_KERNEL, // on-device kernels | ||
| KINETO_PLUGIN_PROFILE_EVENT_TYPE_EXTERNAL_CORRELATION, | ||
| KINETO_PLUGIN_PROFILE_EVENT_TYPE_CUDA_RUNTIME, // host side cuda runtime | ||
| // events | ||
| KINETO_PLUGIN_PROFILE_EVENT_TYPE_CUDA_DRIVER, // host side cuda driver events | ||
| KINETO_PLUGIN_PROFILE_EVENT_TYPE_CPU_INSTANT_EVENT, // host side point-like | ||
| // events | ||
| KINETO_PLUGIN_PROFILE_EVENT_TYPE_PYTHON_FUNCTION, | ||
| KINETO_PLUGIN_PROFILE_EVENT_TYPE_OVERHEAD, // CUPTI induced overhead events | ||
| // sampled from its overhead API. | ||
| KINETO_PLUGIN_PROFILE_EVENT_TYPE_CUDA_SYNC, // synchronization events between | ||
| // runtime and kernels | ||
| KINETO_PLUGIN_PROFILE_EVENT_TYPE_GPU_PM_COUNTER, // GPU PM counters | ||
| KINETO_PLUGIN_PROFILE_EVENT_NUM_TYPES | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is the most problematic in my opinion. As it is fixed set it would be impossible to create new plugin without contributing to pytorch code and changing this file by adding new enums supported by new plugin.
Then there would exist enum in pytorch code that is used nowhere in the code, as it would be used only by the plugin.
How about more dynamic approach ?
Having integer here instead of fixed enums. The lower range 0..N would map to fixed enums and higher values would be assigned dynamically to plugins:
0 - A-1 : fixed enumx
A - B-1 : 'enums' for the 1st plugin
B - C-1 : 'enums' for the 2nd plugin
...
During initialization the plugin would get the base address in enums address space: A for 1st plugin, B for 2nd plugin.
In return it would provide the list of strings, containing supported activities, that would be converted in provided order to its enum space.
If you prefer to keep fixed list please add here KINETO_PLUGIN_PROFILE_EVENT_TYPE_XPU_RUNTIME.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, we should try to make the ActivityTypes more generic so we have lesser number of event types in the list and every new platform should not be adding its own version runtime and *PU events. That way new plugins should rarely ever need adding enums.
For now I will update with adding all activity types in the header file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added all the activity types for completeness here now.
| KINETO_PLUGIN_PROFILE_EVENT_TYPE_GPU_MEMSET, | ||
| KINETO_PLUGIN_PROFILE_EVENT_TYPE_CONCURRENT_KERNEL, // on-device kernels | ||
| KINETO_PLUGIN_PROFILE_EVENT_TYPE_EXTERNAL_CORRELATION, | ||
| KINETO_PLUGIN_PROFILE_EVENT_TYPE_CUDA_RUNTIME, // host side cuda runtime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These events should be more generic in my opinion.
Why CUDA is hardcoded?
What about Intel's XPU, AMD's ROCM, Google's TPU, some custom FPGA, etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tsocha I completely agree with you. Note that this enum is reflecting the C++ enums in ActivityType.h.
https://github.com/pytorch/kineto/blob/main/libkineto/include/ActivityType.h#L19-L28
Due to historical reasons the activities are named after CUDA but ideally they should be named generically rather than for each GPUs's runtime. I don't think we can fix it in this PR but I would do something like
enum class .. {
GPU_RUNTIME
CUDA_RUNTIME=GPU_RUNTIME
XPU_RUNTIME=GPU_RUNTIME
MTIA_RUNTIME=GPU_RUNTIME
cc @sraikund16 that makes sense long term?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@briancoutinho Agreed, this should be easier to scale
3237912 to
0641549
Compare
|
cc @sraikund16 @davidberard98 this is ready for review. |
|
@sraikund16 / @aaronenyeshi / @sanrise Gentle nudge, please help with review. |
|
@sraikund16 Please help with review:) |
| // This file handles pure C plugin profiler interface and converts to internal | ||
| // profiler interface | ||
|
|
||
| class PluginProfilerSession : public IActivityProfilerSession { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you plan to expose toggleCollectionDynamic functionality, so plugin profilers can react to it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not currently since this is not yet part of the IActivityProfilerSession interface.
However, the structures here can be extended as long as we keep the same order
https://github.com/pytorch/kineto/blob/main/libkineto/include/IActivityProfiler.h
c72bef6 to
e9d5cf1
Compare
|
Hi @malfet curious if we could get some feedback on this PR. guessing folks are busy but is there anyone from the maintainers from Meta who can help import and get this to the next stage. |
| // Clear error state | ||
| dlerror(); | ||
|
|
||
| void *pHandle = dlopen(libFilePath.c_str(), RTLD_LAZY); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you plan to add support for Windows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@australopitek Not planning to in this PR but it should be an easy addition on top of this PR. Personally, I don't have a Windows setup to test things right now.
|
@australopitek Still waiting to hear back from the maintainers at Meta. Though Kineto is part of PyTorch Foundation now, the codebase is not github first from what I know, it needs to be imported by Meta to push through. Hoping the maintainers at Meta get back on this. @australopitek If you describe your use-case for this interface too that might help make a stronger case. Thanks! |
@briancoutinho In response to my comment https://github.com/pytorch/kineto/pull/1148/files#r2454233130, we want to move XpuptiProfiler to external Intel repo, so it's independent from Kineto repository. Currently XpuptiProfiler is responsible for various traces (runtime, kernel, memset, etc.). |
|
Taking a look, for some reason I was not getting notifications on this PR. |
|
@sraikund16 has imported this pull request. If you are a Meta employee, you can view this in D85576028. |
|
@briancoutinho has updated the pull request. You must reimport the pull request before landing. |
…m in configure method
d257ac2 to
676135f
Compare
|
@briancoutinho has updated the pull request. You must reimport the pull request before landing. |
|
@australopitek Updated the start trace API to pass enabled Activity types, this should help configure plugins 👍 Also see the DynamicPluginTests.cpp, that verifies this works Updates
|
|
@sraikund16 Thanks for checking in 👍 |
| // [in] Enabled activity types. | ||
| KinetoPlugin_ProfileEventType *pEnabledActivityTypes; | ||
|
|
||
| // [in] Max length of pEnabledActivityTypes | ||
| size_t enabledActivityTypesMaxLen; | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if KinetoPlugin_ProfilerCreate_Params isn't a better place for these params. Both Cupti profiler and Xpupti profiler enable activities in configure phase, which happen before start https://github.com/pytorch/kineto/blob/main/libkineto/src/CuptiActivityProfiler.cpp#L1118
Credits to @zli669 for implementing this change, I am just setting this up here for contribution to kineto.
Overview
Adds the capability to dynamically load plugins for Kineto (#1121). The core idea is to have plugin modules that can be made available as a shared object file (.so). Kineto loads all .so object files in a specified path:
These plugins can then register themselves with kineto, start/stop profiling and return trace events to kineto.
Please look at
DynamicPluginTest.cppfor example usage of this API.The PR also enables counter events that can show up in the trace.
Details
Companion changes
First, let's address some simpler changes:
GPU_PM_COUNTER, enabling a stream of performance events to be logged by any plugin in Kineto.KINETO_DISABLE_CUPTI: Disables using CUPTI on NVIDIA GPUs to support non-CUPTI based profiling modules. This helps avoid conflicts with CUPTI, as NVIDIA support is currently closely coupled with it.Dynamic Plugin changes
The Dynamic plugin interface is a generalization of the IActivityProfiler interface in Kineto. The support is implemented in three parts, recommended for review in this order:
1) Plugin C Interface:
libkineto/include/KinetoDynamicPluginInterface.hThe plugin shared object must implement
KinetoPlugin_register(), providing function pointers for trace plugin functions: start(), stop(), create(), destroy(), processEvents(). Kineto calls these function pointers.To ensure ABI compatibility and avoid C++ compiler mismatches, a pure C interface is used. This interface includes C versions of key structures and enums like
ActivityType,ProfileEvent,Flows, etc.Lastly, To transfer trace data from the plugin to Kineto, an opaque C object called
KinetoTraceBuilderis used. The key idea is the trace builder handles generating events and transferring them to kineto's Activity profiler. This ensures there is NO dynamic memory being passed around between plugin shared object and kineto2) Shim to Internal Plugin Interface:
libkineto/src/dynamic_plugin/PluginProfiler.hThe
PluginProfilerclass implements the IActivityProfiler interface used internally in Kineto. It controls the profiling session and generates session trace data using the function pointers from the shared object.2.2) Trace Event Builder:
libkineto/src/dynamic_plugin/PluginTraceBuilder.hImplements the TraceEvent builder used by the shared object plugin.
3) Plugin Loader:
libkineto/src/dynamic_plugin/PluginLoader.hFinally, the plugin loader handles the discovery of shared object plugins and dynamically loads them into the address space using standard techniques.
Testing
Unit Test
Added a
DynamicPlugTestwhich basically tests out the PluginProfiler class and the creation/start/stop functions. It also tests the trace event builder. The unit test does not do any dynamic loading though.Build
Test
End end test with Always On Profiling Plugin
Build PyTorch and import this branch to third_party/kineto. Base version tested on ``
I used a simple test program
Normal operation uses CUPTI
Run using plugin:
I can open the trace and find events added by plugin