-
Notifications
You must be signed in to change notification settings - Fork 24
chore: Log device names which we are publishing #260
Conversation
612b4d5 to
48efb54
Compare
pkg/driver/dra_hooks.go
Outdated
| select { | ||
| case devices := <-np.netdb.GetResources(ctx): | ||
| klog.V(4).Infof("Received %d devices", len(devices)) | ||
| klog.V(3).Infof("Got %d devices from inventory: %s", len(devices), getDeviceNames(devices)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was on purpose not logging all devices, since that information can be unbounded, you can use level 7 if you want to log all the existing devices, but we've been beating by this in the past in several places, I still remember a bug in ingress-gce that logged all nodes and it caused major issues at scale
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a valid concern and one that I had myself as well, but the reason why I still went ahead with that was because I also assumed that the number of devices on a node would be bounded to a smaller number.
But given we've had this doubt, I've made it such that we only log an arbitrary subset of device names.
aojea
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logging entire list of names is very expensive, so if you still want to add that we need to put it at least at level 7 in a separate statement
48efb54 to
f968db4
Compare
I've made adjustments to not log all device names. So this is what helped identify #263 so there's real value in having even some fixed size of names log (and likely this fixed size of names should serve most cases) |
| klog.V(3).Infof("Got %d devices from inventory: %s", len(devices), formatDeviceNames(devices, 15)) | ||
| devices = filter.FilterDevices(np.celProgram, devices) | ||
| klog.V(4).Infof("After filtering %d devices", len(devices)) | ||
| klog.V(3).Infof("After filtering, publishing %d devices in ResourceSlice(s): %s", len(devices), formatDeviceNames(devices, 15)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have some debug mode which prints all devices? That might be a good middle ground.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you saying debug mode in ADDITION to the subset of devices?
(Usually I've not found debug mode to be helpful, because by the issue we wanted to catch is already missed, so I would want something else as well, besides the debug mode.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's fair, debug mode can sometimes be helpful if we can repro the issue but can't see enough logs as is. Do we not usually expect to see > 15 devices in our inventory? If so then we can go ahead with this, but might want to make it something we can pass as a command line variable so if we need to we can adjust it if it's not too much work to plumb it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, thanks! I think 15 should be good for now. As you described, we've not even close to 15 so far. Mostly just 10.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is always something we can improve on, this seems good as is and adds value.
aojea
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bounded list LGTM
No description provided.