Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

Andy-Jost
Copy link
Contributor

Previously a KernelAttributes bundle held a reference to an internal handle from a Kernel object. This change introduces a weak reference to ensure that a dangling handle is never used. Testing shows no statistically significant change to property access times.

Copy link
Contributor

copy-pr-bot bot commented Sep 9, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@Andy-Jost Andy-Jost requested review from rwgk and leofang September 9, 2025 22:41
@Andy-Jost Andy-Jost self-assigned this Sep 9, 2025

def _get_cached_attribute(self, device_id: int, attribute: driver.CUfunction_attribute) -> int:
"""Helper function to get a cached attribute or fetch and cache it if not present."""
if device_id in self._cache and attribute in self._cache[device_id]:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also see an opportunity to simplify the logic throughout to use pairs as keys:

if (device_id, attribute) in self._cache:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 651beb9

@leofang leofang added enhancement Any code-related improvements P1 Medium priority - Should do cuda.core Everything related to the cuda.core module labels Sep 10, 2025
@leofang leofang added this to the cuda.core beta 7 milestone Sep 10, 2025
rwgk
rwgk previously approved these changes Sep 10, 2025
Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

My suggestion is meant to be a minor by easy optimization.

"""Helper function to get a cached attribute or fetch and cache it if not present."""
if device_id in self._cache and attribute in self._cache[device_id]:
return self._cache[device_id][attribute]
if (device_id, attribute) in self._cache:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about

        cache_key = (device_id, attribute)  # so this tuple doesn't have to be rebuilt
        result = self._cache.get(cache_key, cache_key)  # the tuple doubles as sentinel; there is only one cache lookup
        if result is not cache_key:
            return result

then all the way below

        self.cache[cache_key] = result
        return result

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, thanks!

@Andy-Jost
Copy link
Contributor Author

/ok to test 1878c2d

This comment has been minimized.

Copy link
Collaborator

@rwgk rwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me!

Sorry I missed the full context: How did you discover the bug fixed in this PR? Is there a reasonably easy way to add a test that would have created a dangling handle? — I wouldn't spend more than 10 minutes on that.

Copy link
Collaborator

@rparolin rparolin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Question: Is it possible to write a unit test that exercises the weakref?

@Andy-Jost
Copy link
Contributor Author

Andy-Jost commented Sep 10, 2025

Looks great to me!

Sorry I missed the full context: How did you discover the bug fixed in this PR? Is there a reasonably easy way to add a test that would have created a dangling handle? — I wouldn't spend more than 10 minutes on that.

Leo asked me to bundle the attributes for IPC-enabled mempools and pointed to this as a reference implementation. I noticed this issue by inspection when reading the code. There is an example test called test_mempool_attributes_ownership here (sorry I couldn't find a way to link directly to the code block).

Maybe to be more specific, it was this code that set off the alarm bells:

KernelAttributes(self._handle)

When an object (self here) represents resource ownership, then slicing a part off to share ownership is almost always dangerous.

@Andy-Jost
Copy link
Contributor Author

LGTM.

Question: Is it possible to write a unit test that exercises the weakref?

I looked into this and I think it would be difficult right now. The reference management is a bit of a mess. E.g., there is a reference cycle between Kernel, which refers to a Module, which refers to its Kernels. That makes it difficult to accurately delete Kernels as we'd need for this test. If we cleaned up the references overall it would be easier.

@Andy-Jost Andy-Jost merged commit 6daacba into NVIDIA:main Sep 10, 2025
49 checks passed
@Andy-Jost Andy-Jost deleted the kernel-attributes-ref-fix branch September 10, 2025 18:26
Copy link

Doc Preview CI
Preview removed because the pull request was closed or merged.

@leofang
Copy link
Member

leofang commented Sep 11, 2025

Sorry I did not have a chance to read this PR. It is fine. But there was not bug, and this PR is not a bug fix.

In case it is not clear, this was implemented by design. Kernel's underlying handle CUkernel (or CUfunction) is not something that users can create or destroy out of nowhere. Their lifetime is tied to the parent ObjectCode (with the handle CUlibrary, or CUmodule). And because we implement it (also by design, there's probably a comment somewhere in the same file) in the way that a loaded CUlibrary would not be unloaded (to mimic how fatbins embedded in a shared library works), it is guaranteed that there is no dangling pointer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda.core Everything related to the cuda.core module enhancement Any code-related improvements P1 Medium priority - Should do
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants