Thanks to visit codestin.com
Credit goes to github.com

Skip to content

feat: finish support for path conflicts with 'prefer'#204

Merged
niemeyer merged 22 commits intocanonical:mainfrom
letFunny:follow-up-prefer
Apr 29, 2025
Merged

feat: finish support for path conflicts with 'prefer'#204
niemeyer merged 22 commits intocanonical:mainfrom
letFunny:follow-up-prefer

Conversation

@letFunny
Copy link
Collaborator

  • Have you signed the CLA?

@letFunny letFunny added the Priority Look at me first label Feb 12, 2025
@github-actions
Copy link

github-actions bot commented Feb 12, 2025

Command Mean [s] Min [s] Max [s] Relative
BASE 8.480 ± 0.033 8.422 8.535 1.00 ± 0.00
HEAD 8.462 ± 0.021 8.437 8.511 1.00

Copy link
Collaborator Author

@letFunny letFunny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments for the offline review.

if oldInfo.Kind == GlobPath && (newInfo.Kind == GlobPath || newInfo.Kind == CopyPath) {
if new.Package == old.Package {
continue
if !strdist.GlobPath(newPath, oldPath) {
Copy link
Collaborator Author

@letFunny letFunny Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Note for reviewer]: The 10% speedup you can see is because of moving strdist.GlobPath to execute before looking at the package. I have changed main to do this check before checking if the package is the same and I can reproduce the speedup.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please take any refactorings out of this PR so we can just agree and merge the core logic. We can then improve things further as follow ups, and discuss them with a bit more clarity.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment above could have been better. This was not moved because of performance, we are only moving this line here because the path is the same for all slices so it is wasteful to re-do the computation each time when nothing is changing.

The reason for having a comment is that I was puzzled to see that the CI reported the performance had increased and I wanted to be sure what the reason was, which is why I investigated it and added a comment here in case anyone also wondered about it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for being unclear. The comment wasn't about the logic above specifically, but about the logic below. I was trying to understand what we are trying to fix/improve with the changes below, but I believe this is related to the comment already in discussion at the end of the PR. If not, let's please talk about it.

}
toCheck := []*Slice{new}
_, hasPrefers := prefers[preferKey{preferSource, newPath, ""}]
if hasPrefers {
Copy link
Collaborator Author

@letFunny letFunny Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Note for reviewer]: The change here is that we now have the "head" of the chain in paths but we need to check each glob against all of the members. For example: if pkgA -> pkgB -> pkgC for /foo then paths has pkgA. If the glob is /fo? we need to check all three packages.

In reality we can shortcircuit this because with the current logic if there are two packages then it is impossible for the glob not to conflict. This is because if the files are extracted from the packages then the glob either matches one package or the other; and if the files are not extracted then the glob will also conflict naturally. However, we thought it was better to have a more general case here that will survive changes in the logic and the extra cost is only two if statements executed.

/path:
`,
},
relerror: `slices mypkg1_myslice and mypkg2_myslice conflict on /\*\* and /path`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I seem to be missing the underlying issue here, or at least the test case isn't very clear about what's the real problem. If we select both mypkg1 and mypkg2, won't we end up with the content of /** from mypkg1, but /path from mypkg2? Isn't that exactly what one is trying to achieve in this scenario? It's actually nice to have this test case, because it's a tricky scenario, but the fact it works as intended would be a nice bonus at this stage.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that is the expected outcome. /** conflicts with /path on the other package and there is no prefer relationship between them (there could never be as we don't allow prefer with wildcards). This test case is ensuring that conflict with globs work, having the extra /path is just to make sure the prefer resolution does interact properly with glob conflicts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but the case here is a bit different: the only conflict we have between mypkg1 and mypkg2 is /path, for which we actually have an explicit prefer relationship, right? In other words, yes, we do have a conflict, but the one conflict we have was explicit.

So a few of questions:

  1. Is this PR adding extra logic to handle this case especially?

  2. What would actually happen if this logic was taken out? What would be the behavior?

  3. After removing said logic (if any), is there a case that is not clear as the described case and is still an issue?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get your point now. So yes, we are already handling the glob vs regular path case by default, the difference here is that it is part of a prefer chain. To answer your questions:

  1. Yes indeed, the extra logic that uses the toCheck list to check for all packages in the prefers relationship.
  2. This is very implementation dependent but if the logic was taken out and we reverted back the toCheck list we will only be testing the glob against one package. In this case we could check /** against /path from mypkg1, which won't be a conflict. If we correctly check all packages in the chain we will find out about mypkg2 which does conflict.

Does that make sense?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline, for this example this should not be a conflict, we should change the spec to reflect it.

@letFunny letFunny requested a review from niemeyer March 11, 2025 16:26
Copy link
Contributor

@niemeyer niemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Seems to be almost there. A few more comments below. I'm hoping that we can simplify this a bit further, and then get it in, even if we need further PRs.

// the path.
for skey, source := range prefers {
if skey.side != preferSource || skey.pkg == "" {
// preferTarget packages have the path by construction.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear what this means, even more when the earlier comment right above already says what this logic is supposed to be doing. preferSource and preferTarget get the path from the same variable at the same moment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove the comment, it is probably misplaced anyway.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was recalled what I was trying to say. If a package is recorded as preferTarget then we now it has the path and we don't have to validate it because of how it is done in prefers().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This explanation is itself a bit confusing, because this is a map from a package to a package, in both directions. What we can say is that the source package of a prefers definitely has the path, and the target package may have it or not. The map contains both the source package and the target package, both as key and as a value, due to the bidirectional tracking.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think I get what you mean now. You are correct, we now the "source package" definitely contains the path and we need to check that the "targets" also contain it to avoid wasted work. We can either take the preferSource and the value in the map, which is what I have done, or we can take the preferTarget and the key.pkg value. Both contain the same information because it is bi-directional.

I have chosen to use only the preferSource(s) but I could have done the same with preferTarget(s). It is equivalent to following the arrows in one direction or in the other.

Copy link
Collaborator Author

@letFunny letFunny Apr 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only about the comment, the action item is to drop it because it is confusing, the logic is fine.

// Check for invalid prefer relationships where the package does not have
// the path.
for skey, source := range prefers {
if skey.side != preferSource || skey.pkg == "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to test it if pkg == "". This is a sample package, which will necessarily also be present in a well defined relatinoship.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this duplicated because it will also be a target? I wanted to prevent processing the same package twice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I get the point still. My comment says "we don't need to process", and your says "but won't we process it twice"...? The suggestion is to not process it.

Copy link
Contributor

@niemeyer niemeyer Mar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm confused by the comment below. This will probably be solved by the next point in discussion alraedy.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss in the comment below, all I am saying is that the iteration already covers all the packages that need to be tested, the sample one recorded with skey.pkg == "" doesn't have to be tested because it contains the path by construction:

	for _, pkg := range r.Packages {
		for _, slice := range pkg.Slices {
			for path, info := range slice.Contents {
				...
				// Sample package that requires this path to be in a prefer relationship.
				prefers[preferKey{preferSource, path, ""}] = pkg.Name
			}
		}
	}

if oldInfo.Kind == GlobPath && (newInfo.Kind == GlobPath || newInfo.Kind == CopyPath) {
if new.Package == old.Package {
continue
if !strdist.GlobPath(newPath, oldPath) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please take any refactorings out of this PR so we can just agree and merge the core logic. We can then improve things further as follow ups, and discuss them with a bit more clarity.

/path:
`,
},
relerror: `slices mypkg1_myslice and mypkg2_myslice conflict on /\*\* and /path`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but the case here is a bit different: the only conflict we have between mypkg1 and mypkg2 is /path, for which we actually have an explicit prefer relationship, right? In other words, yes, we do have a conflict, but the one conflict we have was explicit.

So a few of questions:

  1. Is this PR adding extra logic to handle this case especially?

  2. What would actually happen if this logic was taken out? What would be the behavior?

  3. After removing said logic (if any), is there a case that is not clear as the described case and is still an issue?

Copy link
Contributor

@niemeyer niemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After sitting on this for a while and re-reading the algorithm multiple times, I can't avoid feeling bad about the direction this is going. We're doing a lot of repeated work, reevaluating the same paths (e.g. A prefers B prefers C, we check A=>B=>C, then B=>C, etc), and storing maps that look like each other (prefers vs. pathToSlice).

We need to clean this up so this logic remains maintainable. I have a suggestion for how to do this. Let's speak next week when you're back.

Remove optimizations from the conflict algorithm in preparation for
adding support for prefer.
@letFunny letFunny added the Blocked Waiting for something external label Apr 22, 2025
@letFunny
Copy link
Collaborator Author

letFunny commented Apr 22, 2025

As discussed offline with @niemeyer, we have taken a step back to further simplify the logic before finalizing the support. As a result, this PR now depends on #216.

@letFunny letFunny removed the Blocked Waiting for something external label Apr 23, 2025
@letFunny letFunny requested a review from niemeyer April 23, 2025 17:27
Copy link
Contributor

@niemeyer niemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of simple issues only. Logic looks a lot simpler and nicer now, and will look even simpler with one of the suggestions below.

Copy link
Contributor

@niemeyer niemeyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Alberto.

This is the last PR of the series, and as we discussed this took a little while to get here, but the simplification is great, even more when this is being done while adding a new feature that was before quite convoluted. Thanks for keeping it up.

@niemeyer niemeyer merged commit ab6df4d into canonical:main Apr 29, 2025
18 checks passed
@letFunny letFunny deleted the follow-up-prefer branch April 30, 2025 08:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Priority Look at me first

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants