Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

ForNeVeR
Copy link
Contributor

@ForNeVeR ForNeVeR commented Sep 16, 2023

Closes #91958.

I have compared the results of the process timing functions to the results of ps on my M2 MacBook device, and they seem to work properly after the changes.

The new tests were properly failing before the change (since we were returning values 42 times lower than the native results), and are, of course, green after the changes.

@ghost ghost added area-System.Diagnostics.Process community-contribution Indicates that the PR has been added by a community member labels Sep 16, 2023
@ghost
Copy link

ghost commented Sep 16, 2023

Tagging subscribers to this area: @dotnet/area-system-diagnostics-process
See info in area-owners.md if you want to be subscribed.

Issue Details

null

Author: ForNeVeR
Assignees: -
Labels:

area-System.Diagnostics.Process

Milestone: -

Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall it looks good, but I found few places that could be polished a bit.

Thank you for your contribution @ForNeVeR !

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkoritzinsky @AaronRobinsonMSFT In terms of marshaller best practices, do we need to explicitly specify sequential layout for such structs?

Suggested change
public struct mach_timebase_info_data_t
[StructLayout(LayoutKind.Sequential)]
public struct mach_timebase_info_data_t

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, no. Value types in .NET default to sequential layout. However, and this is annoying, there are some Roslyn warnings that are suppressed if one does explicitly mark the type with sequential layout. The reasoning here is historical, but the gist is if Roslyn complains about unreferenced fields, which can happen for types used in interop, then placing StructLayout(LayoutKind.Sequential) on the type will automatically suppress the warning.

The interop team's general guidance here has been to accept the defaults except where there is annoying friction with C# or where the tooling requires explicit details. This falls into the C# friction bucket, but only if a warning is emitted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AaronRobinsonMSFT thank you for a very detailed answer!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't see any related warnings in the compilation (and also we seem to have warnings as errors enabled in this part?).

Does this mean this attribute is unnecessary? I am totally okay with adding that if required. Though, yeah, we all know that sequential is the default struct layout 😅

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean this attribute is unnecessary?

Yes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to initialize this field lazily, when we need it for the first time. Otherwise this sys-call:

  • may be called even if we don't need it (during type initializaiton)
  • in theory it may fail and be very hard to handle properly

Since we use the following logic in 4 places:

return new TimeSpan(Convert.ToInt64($ulong * timeBase.numer / timeBase.denom / NanosecondsTo100NanosecondsFactor));

we could introduce a helper method that could take care of everything, something like this:

Suggested change
private static readonly Interop.libSystem.mach_timebase_info_data_t timeBase = GetTimeBase();
private static volatile uint s_timeBase_numer, s_timeBase_denom;
private static TimeSpan Map(ulong sysTime)
{
uint denom = s_timeBase_denom;
if (denom == default)
{
Interop.libSystem.mach_timebase_info_data_t timeBase = GetTimeBase();
s_timeBase_denom = denom = timeBase.denom;
s_timeBase_numer = timeBase.numer;
}
uint numer = s_timeBase_numer;
return new TimeSpan(Convert.ToInt64(sysTime * numer / denom / NanosecondsTo100NanosecondsFactor));
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, agree with your reasoning and applied a bit modified version of this snippet (the only change is in naming, I like MapTime a little bit better).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for finding and updating all use cases! 👍

@adamsitnik adamsitnik added this to the 9.0.0 milestone Sep 22, 2023
@ForNeVeR
Copy link
Contributor Author

Build on CI is now failing due to something being wrong with SR on Windows. Have I broken that? I need help with resources, I don't think my changes to StringResourcesPath were right.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have two options:

  1. Include the two keys in test resources:

      <data name="CantGetAllPids" xml:space="preserve">
        <value>Could not get all running Process IDs.</value>
      </data>
      <data name="RUsageFailure" xml:space="preserve">
        <value>Failed to set or retrieve rusage information. See the error code for OS-specific error information.</value>
      </data>

    in:


    and delete this hard-coded line.

  2. Include test resources alongside source ones -- separated by semicolon ;:

    	 <StringResourcesPath>$(MSBuildProjectDirectory)\Resources\Strings.resx;$(MSBuildProjectDirectory)\..\src\Resources\Strings.resx</StringResourcesPath>

    multiple resources seem to be supported.

Option 1 is probably much cleaner that avoids mixing stuff, but I'll defer to others. cc @ViktorHofer

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Option 1 is cleaner while it adds some duplication.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it do the last division first, like here?

https://chromium.googlesource.com/chromium/src/+/refs/tags/58.0.3029.141/base/time/time_mac.cc#43

(I don't know what values it typically returns)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I suppose you can store 1000 * denom

I'm sure someone will point out that won't necessarily give the exact same result @tannergooding 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, but I was concerned about losing some precision due to doing division first. Though I can't wrap my head around it right now. Will try to do more thorough analysis later today.

Copy link
Contributor Author

@ForNeVeR ForNeVeR Sep 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, here are my thoughts.

Typical tick values are pretty large actually. For comparison, let's consider a program that was working on a single-core CPU for a year.

> [TimeSpan]::FromDays(365).Ticks
315360000000000
> [long]::MaxValue
9223372036854775807

(here I consider long since TimeSpan takes long in its ctor, not ulong)

This means we will overflow on a numer of 29248 (or if our tick value will get significantly bigger, i.e. we take 29000 years into account, or a CPU with 29k cores).

This is very far from being realistic, but it is also much closer than I expected. So, to be on the safe side, let's do the same as Chromium does: divide by our factor first, to get a hundred times more space, by losing some precision.

We may, of course, also do value / (denom * 100) * numer which is relatively the same, yet will lose a bit more precision as well.

Copy link
Member

@danmoseley danmoseley Sep 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also OK with asserting that numer is reasonably small ( say < 1000) as an alternative, since we suspect that will always be true. Or assert that the math comes out almost the same in 128 bits.

If not we should try to quantify the rounding error and how much it matters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My current suggestion is to leave this as sysTime / NanosecondsTo100NanosecondsFactor * numer / denom.

I am not sure how useful an assertion would be in this code. Perhaps it'd be better to add checked, since we are concerned by overflow? We do not expect this code to be too performance sensitive, so checked might be good.

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not my area, but I"m not sure an exception would be an improvement. I'd be interested in thoughts of a numeric expert like @tannergooding . I'll step aside for area owners to sign off overall...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say that an explicit exception might be an improvement (especially since it would be thrown from a member property such as TotalProcessorTime and not on class init).

But of course, I am ready to listen to other suggestions.

@ForNeVeR ForNeVeR force-pushed the bugfix/91958.macos-cpu-time branch from 861ef04 to d81a801 Compare September 25, 2023 20:21
@ForNeVeR ForNeVeR force-pushed the bugfix/91958.macos-cpu-time branch 2 times, most recently from 080428d to 1a57499 Compare September 25, 2023 20:29
@ForNeVeR
Copy link
Contributor Author

I believe that all the existing feedback is addressed, so I'm waiting for further feedback. Thank you so much for your time, folks.

@marek-safar
Copy link
Contributor

@dotnet/area-system-diagnostics-process please review

@EgorBo
Copy link
Member

EgorBo commented Oct 10, 2023

Ping @dotnet/area-system-diagnostics-process

Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you for your fix @ForNeVeR !

And apologies for the delay (I was on a parental leave).

<data name="Argv_IncludeDoubleQuote" xml:space="preserve">
<value>The argv[0] argument cannot include a double quote.</value>
</data>
<data name="CantGetAllPids" xml:space="preserve">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix I've suggested in #92185 (comment) should just work, but I can take care of that in a separate PR to get the fix merged right now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamsitnik adamsitnik merged commit fb74d89 into dotnet:main Oct 16, 2023
@ForNeVeR
Copy link
Contributor Author

Thank you so much for help! ❤️

@ForNeVeR ForNeVeR deleted the bugfix/91958.macos-cpu-time branch October 16, 2023 18:34
@ghost ghost locked as resolved and limited conversation to collaborators Nov 15, 2023
@jeffhandley
Copy link
Member

/backport to release/8.0-staging

@github-actions github-actions bot unlocked this conversation Mar 22, 2024
Copy link
Contributor

Started backporting to release/8.0-staging: https://github.com/dotnet/runtime/actions/runs/8385573189

@jeffhandley
Copy link
Member

Backporting this fix to 8.0 since this was reported again in #98121.

Copy link
Contributor

@jeffhandley backporting to release/8.0-staging failed, the patch most likely resulted in conflicts:

$ git am --3way --ignore-whitespace --keep-non-patch changes.patch

Applying: Fix #91958: use mach_timebase_info to determine process time coefficient on macOS
Using index info to reconstruct a base tree...
M	src/libraries/Common/src/Interop/OSX/Interop.Libraries.cs
Falling back to patching base and 3-way merge...
Auto-merging src/libraries/Common/src/Interop/OSX/Interop.Libraries.cs
CONFLICT (content): Merge conflict in src/libraries/Common/src/Interop/OSX/Interop.Libraries.cs
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 Fix #91958: use mach_timebase_info to determine process time coefficient on macOS
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
Error: The process '/usr/bin/git' failed with exit code 128

Please backport manually!

Copy link
Contributor

@jeffhandley an error occurred while backporting to release/8.0-staging, please check the run log for details!

Error: git am failed, most likely due to a merge conflict.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 22, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Diagnostics.Process community-contribution Indicates that the PR has been added by a community member os-mac-os-x macOS aka OSX
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Process::TotalProcessorTime is about 42 times lower than expected on ARM64 Mac
10 participants