Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

lslusarczyk
Copy link
Contributor

@lslusarczyk lslusarczyk commented Sep 11, 2025

@lslusarczyk lslusarczyk requested a review from a team as a code owner September 11, 2025 12:30
hProvider->provider_priv, deviceIndex, isAdding);
checkErrorAndSetLastProvider(res, hProvider);
return res;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add empty line

@@ -802,6 +802,15 @@ static umf_result_t cu_memory_provider_get_allocation_properties_size(
return UMF_RESULT_ERROR_INVALID_ARGUMENT;
}

static umf_result_t
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a "default" handler in memory_provider.c (see umfDefaultCtlHandle) and do not modify CUDA provider

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where exactly in the code you think "default" is missing?

do not modify CUDA provid

Rejected. umf_memory_provider_ops_t type received a new field and it should be somehow initialized in UMF_CUDA_MEMORY_PROVIDER_OPS as all other fields. Please explain if you still want me to remove this change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please initialize cuda ops .ext_resident_device_change to NULL.
Next, define the umfDefaultResidentDeviceChange in src/memory_provider.c that returns UMF_RESULT_ERROR_NOT_SUPPORTED and set this handler for each created provider if the ext_resident_device_change is set to NULL (see how this is done for other default handlers like "umfDefaultCloseIPCHandle")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, applied

@@ -1336,6 +1336,15 @@ static umf_result_t trackingGetAllocationPropertiesSize(
p->hUpstream, memory_property_id, size);
}

static umf_result_t trackingResidentDeviceChange(void *provider,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rejected. umf_memory_provider_ops_t type received a new field and it should be somehow initialized in UMF_TRACKING_MEMORY_PROVIDER_OPS as all other fields. Please explain if you still want me to remove this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applied in the same way as in cuda provider

@@ -162,6 +162,15 @@ static umf_result_t nullGetAllocationPropertiesSize(
return UMF_RESULT_SUCCESS;
}

static umf_result_t nullResidentDeviceChange(void *provider,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

Copy link
Contributor Author

@lslusarczyk lslusarczyk Sep 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rejected, same reason as above

@bratpiorka bratpiorka requested review from lplewa and ldorau September 11, 2025 12:53
if (!hParams) {
LOG_ERR("Level Zero memory provider params handle is NULL");
return UMF_RESULT_ERROR_INVALID_ARGUMENT;
}

if (deviceCount && !hDevices) {
LOG_ERR("Resident devices array is NULL, but deviceCount is not zero");
if (residentDevicesCount && !residentDevicesIndices) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please also check if residentDevicesCount == 0 but residentDevicesIndices != NULL

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

additionally, should indices be unique?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, added checking for uniqueness

@@ -930,6 +987,155 @@ static umf_result_t ze_memory_provider_get_allocation_properties_size(
return UMF_RESULT_ERROR_INVALID_ARGUMENT;
}

struct ze_memory_provider_resident_device_change_data {
bool isAdding;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_adding

change_data->source_memory_provider
->device_handles[change_data->peer_device_index];

// TODO: add assertions to UMF and change it to be an assertion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use ASSERT() macros from src/utils/utils_assert.h

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Postponed to some future PR. I need here a permanent ASSERT, not a one that compiles on in debug config like ASSERT does.

info->props.base, info->props.base_size);
} else {
result = ZE_RESULT_SUCCESS;
// TODO: currently not implemented call evict here
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you plan to add the missing code here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not in this PR.

if (ze_provider->resident_device_count == 0 ||
existing_peer_index == ze_provider->resident_device_count) {
// not found
if (!isAdding) { // impossible for UR, should be an assertion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this should be an assertion, please use the ASSERT() macro from src/utils/utils_assert.h
Also, move the comment to the next line

Copy link
Contributor Author

@lslusarczyk lslusarczyk Sep 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be changed in future PR when umf will have permanent assertions. Comments moved.


static umf_result_t
ze_memory_provider_resident_device_change(void *provider, uint32_t device_index,
bool isAdding) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_adding

// adding case
if (ze_provider->device_count <=
ze_provider
->resident_device_count) { // impossible for UR, should be an assertion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this should be an assertion, please use the ASSERT() macro from src/utils/utils_assert.h
Also, move the comment to the next line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

postponed when UMF have permanent assertions, comments moved

}

if (ze_provider->device_count <=
device_index) { // impossible for UR, should be an assertion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applied as above


} else {
// found
if (isAdding) { // impossible for UR, should be an assertion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applied as above

LOG_ERR("umfMemoryTrackerIterateAll did not manage to do some change "
"numFailed:%d, numSuccess:%d",
privData.success_changes, privData.failed_changes);
return UMF_RESULT_ERROR_INVALID_ARGUMENT; // probably some other result is better, best just change into assertion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could use UMF_RESULT_ERROR_MEMORY_PROVIDER_SPECIFIC
or change to assert

src/libumf.map Outdated
@@ -54,6 +54,7 @@ UMF_1.0 {
umfMemoryProviderPurgeForce;
umfMemoryProviderPurgeLazy;
umfMemoryProviderPutIPCHandle;
umfMemoryProviderResidentDeviceChange;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls move down in the file (after a comment # Added in UMF_1.0)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, mea culpa, pls make a new sction, we're already past v1.0 - we should have now # Added in UMF_1.1 section

@@ -60,6 +60,7 @@ EXPORTS
umfMemoryProviderPurgeForce
umfMemoryProviderPurgeLazy
umfMemoryProviderPutIPCHandle
umfMemoryProviderResidentDeviceChange
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls move down in the file (after a comment ; Added in UMF_1.0)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. first of all, wrong function was moved - umfMemoryProviderResidentDeviceChange is the new one
  2. mea culpa, it should be already in ; Added in UMF_1.1 section

@@ -606,3 +619,16 @@ umf_result_t umfMemoryProviderGetAllocationPropertiesSize(
checkErrorAndSetLastProvider(res, hProvider);
return res;
}

umf_result_t
umfMemoryProviderResidentDeviceChange(umf_memory_provider_handle_t hProvider,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have any tests for this function? including negative tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, success path will be tested by UR tests
failure path does not happen as UR is the only one using this functionality and it never issues wrong parameters

If you think some test case is important and possible to test within UMF please write which case and please guide me where to add it and what other test to use as example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UR is an separate project - we need at least few simple test cases to do not decrease code coverage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I've added some tests in 88f672b

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still missing tests for new functions, even simple negative test would be better than nothing (you can put them in generic API tests: test/memoryProviderAPI.cpp)

Comment on lines +326 to +336
/// @brief Adds or removes devices on which allocations should be made
/// resident.
/// @param provider handle to the memory provider
/// @param device_index identifier of device
/// @param is_adding Boolean indicating if peer is to be removed or added
/// @return UMF_RESULT_SUCCESS on success or appropriate error code on
/// failure.
umf_result_t (*ext_resident_device_change)(void *provider,
uint32_t device_index,
bool is_adding);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function should not be in OPS. This is something very specific to L0 provider. Resident devices are passed to L0 provider thru provider specific params, so control of them should be olso be done through provider specific API, this is why i think with should be implemented through CTL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rejected. We discussed this in teams (4th Aug, 25) we do it as ops (Piotr, sorry in Polish): co do tego API co Ty robisz (w stylu daj mi wszystkie zaalokowane page'e) ja bym chyba sugerował zrobić API a nie robić przez CTLa, Rafał: no to faktycznie chyba dedykowane API lepiej pasuje niż CTL.

CTL should be for statistics, not a universal, hard-to-read tool to implement any API, Łukasz (me): Być może będziecie chcieli uprościć ext_ctl by nie był maszynką, którą mozna zaimplementować wszystko, a jedynie służył do statystyk. Ale to już ja się na tym nie znam - zostawiam do przemyśleń.

Copy link
Contributor

@lplewa lplewa Sep 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked about API to iterate over all allocations, not about API to modify some internal settings of the L0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see it both ways. My comment during that discussion was about a generic functionality implementable for all providers. This isn't that.
I also remember making a point about CTL being useful for provider/pool-specific functionality.

On the other hand, adding an API function is simpler.

Comment on lines 62 to +72
/// @brief Set the resident devices in the parameters struct.
/// @param hParams handle to the parameters of the Level Zero Memory Provider.
/// @param hDevices array of devices for which the memory should be made resident.
/// @param deviceCount number of devices for which the memory should be made resident.
/// @param hDevices array of all devices for which the memory can be made resident.
/// @param deviceCount number of devices for which the memory can be made resident.
/// @param residentDevicesIndices array of indices in all devices array to devices for which the memory should be made resident.
/// @param residentDevicesCount number of items in indices array.
/// @return UMF_RESULT_SUCCESS on success or appropriate error code on failure.
umf_result_t umfLevelZeroMemoryProviderParamsSetResidentDevices(
umf_level_zero_memory_provider_params_handle_t hParams,
ze_device_handle_t *hDevices, uint32_t deviceCount);
ze_device_handle_t *hDevices, uint32_t deviceCount,
uint32_t *residentDevicesIndices, uint32_t residentDevicesCount);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an API(and ABI) break - we are post 1.0 release so you cannot do changes like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall I bump some number? Only UR uses this API and I am changing UR right now as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if UR uses only this API, old UR should work with new umf. And we will not do 2.0 release just after 1.0.

You cannot brake API. If changes are needed we have to keep old function working correctly and add new one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't break backwards compatibility with SYCL / UR. Changing a major version is a significant undertaking that involves updating the components across all the different layers (UMF is nearly the lowest-most component in the stack). We've been burned on this in the past, and it's very disruptive.

In this case, my suggestion would be to simply create a new function that sets the indices.

Copy link
Contributor

@ldorau ldorau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

14 / 16 files reviewed (will be continued)

/// failure.
umf_result_t (*ext_resident_device_change)(void *provider,
uint32_t device_index,
bool is_adding);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather we have two functions - add/remove.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants