Updates KernelAttributes to avoid possible dangling handles. #957

Andy-Jost · 2025-09-09T22:40:54Z

Previously a KernelAttributes bundle held a reference to an internal handle from a Kernel object. This change introduces a weak reference to ensure that a dangling handle is never used. Testing shows no statistically significant change to property access times.

copy-pr-bot · 2025-09-09T22:40:58Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Andy-Jost · 2025-09-09T22:42:50Z

cuda_core/cuda/core/experimental/_module.py


    def _get_cached_attribute(self, device_id: int, attribute: driver.CUfunction_attribute) -> int:
        """Helper function to get a cached attribute or fetch and cache it if not present."""
        if device_id in self._cache and attribute in self._cache[device_id]:


I also see an opportunity to simplify the logic throughout to use pairs as keys:

if (device_id, attribute) in self._cache:

Fixed in 651beb9

rwgk

Looks good to me.

My suggestion is meant to be a minor by easy optimization.

rwgk · 2025-09-10T04:20:26Z

cuda_core/cuda/core/experimental/_module.py

        """Helper function to get a cached attribute or fetch and cache it if not present."""
-        if device_id in self._cache and attribute in self._cache[device_id]:
-            return self._cache[device_id][attribute]
+        if (device_id, attribute) in self._cache:


WDYT about

cache_key = (device_id, attribute) # so this tuple doesn't have to be rebuilt result = self._cache.get(cache_key, cache_key) # the tuple doubles as sentinel; there is only one cache lookup if result is not cache_key: return result

then all the way below

self.cache[cache_key] = result return result

Good suggestion, thanks!

…it script.

Andy-Jost · 2025-09-10T15:13:48Z

/ok to test 1878c2d

rwgk

Looks great to me!

Sorry I missed the full context: How did you discover the bug fixed in this PR? Is there a reasonably easy way to add a test that would have created a dangling handle? — I wouldn't spend more than 10 minutes on that.

rparolin

LGTM.

Question: Is it possible to write a unit test that exercises the weakref?

Andy-Jost · 2025-09-10T16:01:18Z

Looks great to me!

Sorry I missed the full context: How did you discover the bug fixed in this PR? Is there a reasonably easy way to add a test that would have created a dangling handle? — I wouldn't spend more than 10 minutes on that.

Leo asked me to bundle the attributes for IPC-enabled mempools and pointed to this as a reference implementation. I noticed this issue by inspection when reading the code. There is an example test called test_mempool_attributes_ownership here (sorry I couldn't find a way to link directly to the code block).

Maybe to be more specific, it was this code that set off the alarm bells:

KernelAttributes(self._handle)

When an object (self here) represents resource ownership, then slicing a part off to share ownership is almost always dangerous.

Andy-Jost · 2025-09-10T16:43:55Z

LGTM.

Question: Is it possible to write a unit test that exercises the weakref?

I looked into this and I think it would be difficult right now. The reference management is a bit of a mess. E.g., there is a reference cycle between Kernel, which refers to a Module, which refers to its Kernels. That makes it difficult to accurately delete Kernels as we'd need for this test. If we cleaned up the references overall it would be easier.

github-actions · 2025-09-10T18:41:14Z

Doc Preview CI
Preview removed because the pull request was closed or merged.

leofang · 2025-09-11T15:09:08Z

Sorry I did not have a chance to read this PR. It is fine. But there was not bug, and this PR is not a bug fix.

In case it is not clear, this was implemented by design. Kernel's underlying handle CUkernel (or CUfunction) is not something that users can create or destroy out of nowhere. Their lifetime is tied to the parent ObjectCode (with the handle CUlibrary, or CUmodule). And because we implement it (also by design, there's probably a comment somewhere in the same file) in the way that a loaded CUlibrary would not be unloaded (to mimic how fatbins embedded in a shared library works), it is guaranteed that there is no dangling pointer.

Updates KernelAttributes to avoid possible dangling handles.

7490ad1

Andy-Jost requested review from rwgk and leofang September 9, 2025 22:41

Andy-Jost self-assigned this Sep 9, 2025

Andy-Jost commented Sep 9, 2025

View reviewed changes

Simplifies the caching logic in KernelAttributes.

651beb9

leofang added enhancement Any code-related improvements P1 Medium priority - Should do cuda.core Everything related to the cuda.core module labels Sep 10, 2025

leofang added this to the cuda.core beta 7 milestone Sep 10, 2025

rwgk previously approved these changes Sep 10, 2025

View reviewed changes

Slight change to caching logic. Fix in rst format to satisfy pre-comm…

17e4bc5

…it script.

Andy-Jost dismissed rwgk’s stale review via 17e4bc5 September 10, 2025 15:11

Merge branch 'main' into kernel-attributes-ref-fix

1878c2d

This comment has been minimized.

Sign in to view

rwgk approved these changes Sep 10, 2025

View reviewed changes

rparolin approved these changes Sep 10, 2025

View reviewed changes

Andy-Jost merged commit 6daacba into NVIDIA:main Sep 10, 2025
49 checks passed

Andy-Jost deleted the kernel-attributes-ref-fix branch September 10, 2025 18:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Updates KernelAttributes to avoid possible dangling handles. #957

Updates KernelAttributes to avoid possible dangling handles. #957

Uh oh!

Andy-Jost commented Sep 9, 2025

Uh oh!

copy-pr-bot bot commented Sep 9, 2025

Uh oh!

Andy-Jost Sep 9, 2025

Uh oh!

Andy-Jost Sep 9, 2025

Uh oh!

rwgk left a comment

Uh oh!

rwgk Sep 10, 2025

Uh oh!

Andy-Jost Sep 10, 2025

Uh oh!

Andy-Jost commented Sep 10, 2025

Uh oh!

This comment has been minimized.

rwgk left a comment

Uh oh!

rparolin left a comment •

edited

Loading

Uh oh!

Andy-Jost commented Sep 10, 2025 •

edited

Loading

Uh oh!

Andy-Jost commented Sep 10, 2025

Uh oh!

Uh oh!

github-actions bot commented Sep 10, 2025

Uh oh!

leofang commented Sep 11, 2025

Uh oh!

Uh oh!

Updates KernelAttributes to avoid possible dangling handles. #957

Updates KernelAttributes to avoid possible dangling handles. #957

Uh oh!

Conversation

Andy-Jost commented Sep 9, 2025

Uh oh!

copy-pr-bot bot commented Sep 9, 2025

Uh oh!

Andy-Jost Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

Andy-Jost Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

rwgk left a comment

Choose a reason for hiding this comment

Uh oh!

rwgk Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

Andy-Jost Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

Andy-Jost commented Sep 10, 2025

Uh oh!

This comment has been minimized.

rwgk left a comment

Choose a reason for hiding this comment

Uh oh!

rparolin left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Andy-Jost commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Andy-Jost commented Sep 10, 2025

Uh oh!

Uh oh!

github-actions bot commented Sep 10, 2025

Uh oh!

leofang commented Sep 11, 2025

Uh oh!

Uh oh!

rparolin left a comment •

edited

Loading

Andy-Jost commented Sep 10, 2025 •

edited

Loading