-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Fix #91958: use mach_timebase_info
to determine process time coefficient on macOS
#92185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Tagging subscribers to this area: @dotnet/area-system-diagnostics-process Issue Detailsnull
|
src/libraries/System.Diagnostics.Process/src/System/Diagnostics/Process.OSX.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Diagnostics.Process/tests/ProcessTests.Unix.cs
Outdated
Show resolved
Hide resolved
7b79541
to
7571e06
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall it looks good, but I found few places that could be polished a bit.
Thank you for your contribution @ForNeVeR !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkoritzinsky @AaronRobinsonMSFT In terms of marshaller best practices, do we need to explicitly specify sequential layout for such structs?
public struct mach_timebase_info_data_t | |
[StructLayout(LayoutKind.Sequential)] | |
public struct mach_timebase_info_data_t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, no. Value types in .NET default to sequential layout. However, and this is annoying, there are some Roslyn warnings that are suppressed if one does explicitly mark the type with sequential layout. The reasoning here is historical, but the gist is if Roslyn complains about unreferenced fields, which can happen for types used in interop, then placing StructLayout(LayoutKind.Sequential)
on the type will automatically suppress the warning.
The interop team's general guidance here has been to accept the defaults except where there is annoying friction with C# or where the tooling requires explicit details. This falls into the C# friction bucket, but only if a warning is emitted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AaronRobinsonMSFT thank you for a very detailed answer!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't see any related warnings in the compilation (and also we seem to have warnings as errors enabled in this part?).
Does this mean this attribute is unnecessary? I am totally okay with adding that if required. Though, yeah, we all know that sequential is the default struct layout 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean this attribute is unnecessary?
Yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to initialize this field lazily, when we need it for the first time. Otherwise this sys-call:
- may be called even if we don't need it (during type initializaiton)
- in theory it may fail and be very hard to handle properly
Since we use the following logic in 4 places:
return new TimeSpan(Convert.ToInt64($ulong * timeBase.numer / timeBase.denom / NanosecondsTo100NanosecondsFactor));
we could introduce a helper method that could take care of everything, something like this:
private static readonly Interop.libSystem.mach_timebase_info_data_t timeBase = GetTimeBase(); | |
private static volatile uint s_timeBase_numer, s_timeBase_denom; | |
private static TimeSpan Map(ulong sysTime) | |
{ | |
uint denom = s_timeBase_denom; | |
if (denom == default) | |
{ | |
Interop.libSystem.mach_timebase_info_data_t timeBase = GetTimeBase(); | |
s_timeBase_denom = denom = timeBase.denom; | |
s_timeBase_numer = timeBase.numer; | |
} | |
uint numer = s_timeBase_numer; | |
return new TimeSpan(Convert.ToInt64(sysTime * numer / denom / NanosecondsTo100NanosecondsFactor)); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, agree with your reasoning and applied a bit modified version of this snippet (the only change is in naming, I like MapTime
a little bit better).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you for finding and updating all use cases! 👍
src/libraries/System.Diagnostics.Process/tests/ProcessTests.Unix.cs
Outdated
Show resolved
Hide resolved
Build on CI is now failing due to something being wrong with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have two options:
-
Include the two keys in test resources:
<data name="CantGetAllPids" xml:space="preserve"> <value>Could not get all running Process IDs.</value> </data> <data name="RUsageFailure" xml:space="preserve"> <value>Failed to set or retrieve rusage information. See the error code for OS-specific error information.</value> </data>
in:
<?xml version="1.0" encoding="utf-8"?>
and delete this hard-coded line. -
Include test resources alongside source ones -- separated by semicolon
;
:<StringResourcesPath>$(MSBuildProjectDirectory)\Resources\Strings.resx;$(MSBuildProjectDirectory)\..\src\Resources\Strings.resx</StringResourcesPath>
multiple resources seem to be supported.
Option 1 is probably much cleaner that avoids mixing stuff, but I'll defer to others. cc @ViktorHofer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Option 1 is cleaner while it adds some duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it do the last division first, like here?
https://chromium.googlesource.com/chromium/src/+/refs/tags/58.0.3029.141/base/time/time_mac.cc#43
(I don't know what values it typically returns)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, this says it gets the 42 from 125/3. Does 125 * ticks go over ulong max?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I suppose you can store 1000 * denom
I'm sure someone will point out that won't necessarily give the exact same result @tannergooding 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, but I was concerned about losing some precision due to doing division first. Though I can't wrap my head around it right now. Will try to do more thorough analysis later today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, here are my thoughts.
Typical tick values are pretty large actually. For comparison, let's consider a program that was working on a single-core CPU for a year.
> [TimeSpan]::FromDays(365).Ticks
315360000000000
> [long]::MaxValue
9223372036854775807
(here I consider long
since TimeSpan
takes long
in its ctor, not ulong
)
This means we will overflow on a numer
of 29248
(or if our tick value will get significantly bigger, i.e. we take 29000 years into account, or a CPU with 29k cores).
This is very far from being realistic, but it is also much closer than I expected. So, to be on the safe side, let's do the same as Chromium does: divide by our factor first, to get a hundred times more space, by losing some precision.
We may, of course, also do value / (denom * 100) * numer
which is relatively the same, yet will lose a bit more precision as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also OK with asserting that numer is reasonably small ( say < 1000) as an alternative, since we suspect that will always be true. Or assert that the math comes out almost the same in 128 bits.
If not we should try to quantify the rounding error and how much it matters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My current suggestion is to leave this as sysTime / NanosecondsTo100NanosecondsFactor * numer / denom
.
I am not sure how useful an assertion would be in this code. Perhaps it'd be better to add checked
, since we are concerned by overflow? We do not expect this code to be too performance sensitive, so checked
might be good.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not my area, but I"m not sure an exception would be an improvement. I'd be interested in thoughts of a numeric expert like @tannergooding . I'll step aside for area owners to sign off overall...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say that an explicit exception might be an improvement (especially since it would be thrown from a member property such as TotalProcessorTime
and not on class init).
But of course, I am ready to listen to other suggestions.
861ef04
to
d81a801
Compare
…efficient on macOS
080428d
to
1a57499
Compare
1a57499
to
ce50c65
Compare
I believe that all the existing feedback is addressed, so I'm waiting for further feedback. Thank you so much for your time, folks. |
@dotnet/area-system-diagnostics-process please review |
Ping @dotnet/area-system-diagnostics-process |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you for your fix @ForNeVeR !
And apologies for the delay (I was on a parental leave).
<data name="Argv_IncludeDoubleQuote" xml:space="preserve"> | ||
<value>The argv[0] argument cannot include a double quote.</value> | ||
</data> | ||
<data name="CantGetAllPids" xml:space="preserve"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fix I've suggested in #92185 (comment) should just work, but I can take care of that in a separate PR to get the fix merged right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for help! ❤️ |
/backport to release/8.0-staging |
Started backporting to release/8.0-staging: https://github.com/dotnet/runtime/actions/runs/8385573189 |
Backporting this fix to 8.0 since this was reported again in #98121. |
@jeffhandley backporting to release/8.0-staging failed, the patch most likely resulted in conflicts: $ git am --3way --ignore-whitespace --keep-non-patch changes.patch
Applying: Fix #91958: use mach_timebase_info to determine process time coefficient on macOS
Using index info to reconstruct a base tree...
M src/libraries/Common/src/Interop/OSX/Interop.Libraries.cs
Falling back to patching base and 3-way merge...
Auto-merging src/libraries/Common/src/Interop/OSX/Interop.Libraries.cs
CONFLICT (content): Merge conflict in src/libraries/Common/src/Interop/OSX/Interop.Libraries.cs
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 Fix #91958: use mach_timebase_info to determine process time coefficient on macOS
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
Error: The process '/usr/bin/git' failed with exit code 128 Please backport manually! |
@jeffhandley an error occurred while backporting to release/8.0-staging, please check the run log for details! Error: git am failed, most likely due to a merge conflict. |
Closes #91958.
I have compared the results of the process timing functions to the results of
ps
on my M2 MacBook device, and they seem to work properly after the changes.The new tests were properly failing before the change (since we were returning values 42 times lower than the native results), and are, of course, green after the changes.