Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

birojnayak
Copy link
Contributor

This PR is to improve cold start performance which occurs during certificate load #96740 . This happens across multiple distros(checked in AmazonLinux2023 and RHEL(9.3)), the same file (in this case /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem) is getting re-processed twice.

[ec2-user@ip-172-31-31-127 ~]$ openssl version -d
OPENSSLDIR: "/etc/pki/tls"
**File Processing**
[ec2-user@ip-172-31-31-127 ~]$ ls -la /etc/pki/tls/cert.pem 
lrwxrwxrwx. 1 root root 49 Aug 29 17:21 /etc/pki/tls/cert.pem -> /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem
**Directory Processing**
[ec2-user@ip-172-31-31-127 ~]$ ls -la /etc/pki/tls/certs
total 0
drwxr-xr-x. 2 root root  54 Nov  8 07:41 .
drwxr-xr-x. 5 root root 126 Nov  8 07:41 ..
lrwxrwxrwx. 1 root root  49 Aug 29 17:21 ca-bundle.crt -> /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem
lrwxrwxrwx. 1 root root  55 Aug 29 17:21 ca-bundle.trust.crt -> /etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt

Tested the method with above fix I see the total time reduced from 103ms to 65ms..

@ghost ghost added area-System.Security community-contribution Indicates that the PR has been added by a community member labels Jan 21, 2024
@ghost
Copy link

ghost commented Jan 21, 2024

Tagging subscribers to this area: @dotnet/area-system-security, @bartonjs, @vcsjones
See info in area-owners.md if you want to be subscribed.

Issue Details

This PR is to improve cold start performance which occurs during certificate load #96740 . This happens across multiple distros(checked in AmazonLinux2023 and RHEL(9.3)), the same file (in this case /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem) is getting re-processed twice.

[ec2-user@ip-172-31-31-127 ~]$ openssl version -d
OPENSSLDIR: "/etc/pki/tls"
**File Processing**
[ec2-user@ip-172-31-31-127 ~]$ ls -la /etc/pki/tls/cert.pem 
lrwxrwxrwx. 1 root root 49 Aug 29 17:21 /etc/pki/tls/cert.pem -> /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem
**Directory Processing**
[ec2-user@ip-172-31-31-127 ~]$ ls -la /etc/pki/tls/certs
total 0
drwxr-xr-x. 2 root root  54 Nov  8 07:41 .
drwxr-xr-x. 5 root root 126 Nov  8 07:41 ..
lrwxrwxrwx. 1 root root  49 Aug 29 17:21 ca-bundle.crt -> /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem
lrwxrwxrwx. 1 root root  55 Aug 29 17:21 ca-bundle.trust.crt -> /etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt

Tested the method with above fix I see the total time reduced from 103ms to 65ms..

Author: birojnayak
Assignees: -
Labels:

area-System.Security

Milestone: -

@tmds
Copy link
Member

tmds commented Jan 21, 2024

the same file (in this case /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem) is getting re-processed twice.

The reason for this is that the file we get from OpenSSL (through GetX509RootStoreFile) is also part of the directory we get from OpenSSL (through GetX509RootStorePath).

I see the total time reduced from 103ms to 65ms..

Nice!

Copy link
Member

@tmds tmds Jan 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't end up resolving the final path for links.
(And, if we use strings, we should specify the StringComparison on the HashSet)

Rather than storing the path strings, we can extend the TryStatFile method that gets called above so it returns as an out argument a tuple with the long Ino and long Dev from struct FileStatus.
This tuple can then be stored in the processedFiles HashSet.

@tmds
Copy link
Member

tmds commented Jan 21, 2024

cc @adamsitnik

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove the skipStat argument from this method. We no longer want to ever skip the stat.

And, the changes to this functions can be something like:

if (!TryStatFile(file, out lastModified, out (long, long) filedId) ||
    !processedFiles.Add(filedId))
{
    return false;
}

We don't need to wait till the end of the function to add the file since next time we try to process it, we expect the same result.

Copy link
Contributor Author

@birojnayak birojnayak Jan 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tmds I am going to remove skiptest args in next commit as suggested. But we can't return false if processedFiles already have an entry, this will trigger a false positive indicating directory reading is unsuccessful, which will cause to read all certs from /etc/ssl/certs (which we don't want as I believe that's the fallback plan).

Rather, I will add the entry to processedFiles if readdata is true . Second time, if processedFiles has an entry, will return true(basically letting caller know that it was a success) without re-processing it. Hope it clarifies, let me know if you think otherwise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will trigger a false positive indicating directory reading is unsuccessful, which will cause to read all certs from /etc/ssl/certs (which we don't want as I believe that's the fallback plan).

That's right, if we return false, it messes up `hasStoreData.

Copy link
Member

@tmds tmds Jan 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's avoid adding new functions, and keep using the return bool for success pattern. The caller can ignore the out arguments (using out _) he doesn't need.
We'd have:

private static bool TryStatFile(string path, out DateTime lastModified, out (long, long) fileId)
   => CallStat(...
private static bool TryStatDirectory(string path, out DateTime lastModified)
   => CallStat(...
private static bool CallStat(string path, out DateTime lastModified, out (long, long) fileId)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

working on it in the next commit.. thank you for providing feedback

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: keep the name TryStat.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: only keep 1 TryStatFile and update the caller that ignores fileId by passing it out _ as the last argument.

Copy link
Member

@tmds tmds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@birojnayak birojnayak force-pushed the cert-load-fix-forAL2023 branch from c8ed96a to 3f0980a Compare January 22, 2024 21:40
@birojnayak birojnayak changed the base branch from release/8.0 to main January 22, 2024 21:40
@birojnayak
Copy link
Contributor Author

rebased and re-targeted to main..

@birojnayak
Copy link
Contributor Author

We would like this to be in the .NET8.0 service release as it will help in all recent RHEL based distros and potentially more.

@birojnayak
Copy link
Contributor Author

@tmds @vcsjones @jeffhandley what else needs to be done to get this merged and back ported to .NET 8 ?

// In order to maintain "finalization-free" the GetNativeCollections method would need to
// DangerousAddRef, and the callers would need to DangerousRelease, adding more interlocked operations
// on every call.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The blank line shouldn't have been removed, since the comment isn't about the following code, it's about code that's not present; making it a separate logical code paragraph. But it's not worth spinning on.

Comment on lines 368 to 376

private static bool TryStatFile(string path, out DateTime lastModified)
=> TryStat(path, Interop.Sys.FileTypes.S_IFREG, out lastModified);

private static bool TryStatDirectory(string path, out DateTime lastModified)
=> TryStat(path, Interop.Sys.FileTypes.S_IFDIR, out lastModified);
=> TryStat(path, Interop.Sys.FileTypes.S_IFDIR, out lastModified, out _);

private static bool TryStat(string path, int fileType, out DateTime lastModified)
private static bool TryStatFile(string path, out DateTime lastModified, out (long, long) fileId)
=> TryStat(path, Interop.Sys.FileTypes.S_IFREG, out lastModified, out fileId);

private static bool TryStat(string path, int fileType, out DateTime lastModified, out (long, long) fileId)
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally the TryStatFile/TryStatDirectory order would have been maintained so the diff showed that it was really just adding the out. But, again, it's not worth spinning on.

@bartonjs bartonjs merged commit 18b7575 into dotnet:main Jan 26, 2024
@bartonjs
Copy link
Member

what else needs to be done to get ... back ported to .NET 8 ?

It won't be backported to .NET 8. Basically, the only things that get backported to LTS releases are:

  • .NET got broken by an OS change
  • A significant bug in a v1 feature (fix it early so we don't have to deal with bug-compatibility forever).
  • A functional regression from the previous release.
  • A significant perf regression from the previous release.

The loader logic hasn't really changed in the last several releases, so I don't think this is addressing a regression. So while a 50% startup reduction for this scenario is good, it's not the sort of thing we do in servicing.

If I'm wrong, and you have numbers that show that the same code was <=80ms to start in .NET 6, but is now over 100ms in .NET 8... then I can submit that as the justification, but there's no guarantee that the servicing review process would take it.

@birojnayak
Copy link
Contributor Author

birojnayak commented Jan 26, 2024

@bartonjs I would request the team to consider as .NET 8 is going to stay for long.

@jkotas
Copy link
Member

jkotas commented Jan 27, 2024

@birojnayak Our customers have a strong preference for stability and minimal churn in servicing. We have our servicing bar documented at https://dotnet.microsoft.com/en-us/platform/support/policy/dotnet-core#servicing. Our backport PRs require written justification that meets this bar.

This issue has been present for number of years. In order to justify backport of the fix, we would need to explain what changed recently that this issue needs backporting now, when we have lived with it for many years. It is not a recent regression. If you can help us to write a valid justification, we can try to get it approved for backport.

(".NET 8 is going to stay for long." is not valid justification for a backport.)

@jkotas
Copy link
Member

jkotas commented Jan 27, 2024

The linked PR describes and implements a workaround in AWS lambda: aws/aws-lambda-dotnet#1661

@birojnayak
Copy link
Contributor Author

@jkotas I will leave it you folks to decide if your team wants to backport to .NET 8.0 (as we have solved internally for AWS lambda via the solution I suggested to team). As long as it is coming in .NET9.0 , it's going to help all.

@adamsitnik adamsitnik added the tenet-performance Performance related issue label Jan 29, 2024
@sebastienros
Copy link
Member

FYI this is how it improves startup on Json Https aspnet apps

image

@birojnayak
Copy link
Contributor Author

this is awesome @sebastienros

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Security community-contribution Indicates that the PR has been added by a community member tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants