feat: finish support for path conflicts with 'prefer'#204
feat: finish support for path conflicts with 'prefer'#204niemeyer merged 22 commits intocanonical:mainfrom
Conversation
letFunny
commented
Feb 12, 2025
- Have you signed the CLA?
|
letFunny
left a comment
There was a problem hiding this comment.
Some comments for the offline review.
internal/setup/setup.go
Outdated
| if oldInfo.Kind == GlobPath && (newInfo.Kind == GlobPath || newInfo.Kind == CopyPath) { | ||
| if new.Package == old.Package { | ||
| continue | ||
| if !strdist.GlobPath(newPath, oldPath) { |
There was a problem hiding this comment.
[Note for reviewer]: The 10% speedup you can see is because of moving strdist.GlobPath to execute before looking at the package. I have changed main to do this check before checking if the package is the same and I can reproduce the speedup.
There was a problem hiding this comment.
Can we please take any refactorings out of this PR so we can just agree and merge the core logic. We can then improve things further as follow ups, and discuss them with a bit more clarity.
There was a problem hiding this comment.
My comment above could have been better. This was not moved because of performance, we are only moving this line here because the path is the same for all slices so it is wasteful to re-do the computation each time when nothing is changing.
The reason for having a comment is that I was puzzled to see that the CI reported the performance had increased and I wanted to be sure what the reason was, which is why I investigated it and added a comment here in case anyone also wondered about it.
There was a problem hiding this comment.
Sorry for being unclear. The comment wasn't about the logic above specifically, but about the logic below. I was trying to understand what we are trying to fix/improve with the changes below, but I believe this is related to the comment already in discussion at the end of the PR. If not, let's please talk about it.
internal/setup/setup.go
Outdated
| } | ||
| toCheck := []*Slice{new} | ||
| _, hasPrefers := prefers[preferKey{preferSource, newPath, ""}] | ||
| if hasPrefers { |
There was a problem hiding this comment.
[Note for reviewer]: The change here is that we now have the "head" of the chain in paths but we need to check each glob against all of the members. For example: if pkgA -> pkgB -> pkgC for /foo then paths has pkgA. If the glob is /fo? we need to check all three packages.
In reality we can shortcircuit this because with the current logic if there are two packages then it is impossible for the glob not to conflict. This is because if the files are extracted from the packages then the glob either matches one package or the other; and if the files are not extracted then the glob will also conflict naturally. However, we thought it was better to have a more general case here that will survive changes in the logic and the extra cost is only two if statements executed.
internal/setup/setup_test.go
Outdated
| /path: | ||
| `, | ||
| }, | ||
| relerror: `slices mypkg1_myslice and mypkg2_myslice conflict on /\*\* and /path`, |
There was a problem hiding this comment.
I seem to be missing the underlying issue here, or at least the test case isn't very clear about what's the real problem. If we select both mypkg1 and mypkg2, won't we end up with the content of /** from mypkg1, but /path from mypkg2? Isn't that exactly what one is trying to achieve in this scenario? It's actually nice to have this test case, because it's a tricky scenario, but the fact it works as intended would be a nice bonus at this stage.
There was a problem hiding this comment.
I don't think that is the expected outcome. /** conflicts with /path on the other package and there is no prefer relationship between them (there could never be as we don't allow prefer with wildcards). This test case is ensuring that conflict with globs work, having the extra /path is just to make sure the prefer resolution does interact properly with glob conflicts.
There was a problem hiding this comment.
Sure, but the case here is a bit different: the only conflict we have between mypkg1 and mypkg2 is /path, for which we actually have an explicit prefer relationship, right? In other words, yes, we do have a conflict, but the one conflict we have was explicit.
So a few of questions:
-
Is this PR adding extra logic to handle this case especially?
-
What would actually happen if this logic was taken out? What would be the behavior?
-
After removing said logic (if any), is there a case that is not clear as the described case and is still an issue?
There was a problem hiding this comment.
I get your point now. So yes, we are already handling the glob vs regular path case by default, the difference here is that it is part of a prefer chain. To answer your questions:
- Yes indeed, the extra logic that uses the
toChecklist to check for all packages in the prefers relationship. - This is very implementation dependent but if the logic was taken out and we reverted back the
toChecklist we will only be testing the glob against one package. In this case we could check/**against/pathfrom mypkg1, which won't be a conflict. If we correctly check all packages in the chain we will find out about mypkg2 which does conflict.
Does that make sense?
There was a problem hiding this comment.
Discussed offline, for this example this should not be a conflict, we should change the spec to reflect it.
niemeyer
left a comment
There was a problem hiding this comment.
Thanks. Seems to be almost there. A few more comments below. I'm hoping that we can simplify this a bit further, and then get it in, even if we need further PRs.
internal/setup/setup.go
Outdated
| // the path. | ||
| for skey, source := range prefers { | ||
| if skey.side != preferSource || skey.pkg == "" { | ||
| // preferTarget packages have the path by construction. |
There was a problem hiding this comment.
It is not clear what this means, even more when the earlier comment right above already says what this logic is supposed to be doing. preferSource and preferTarget get the path from the same variable at the same moment.
There was a problem hiding this comment.
I will remove the comment, it is probably misplaced anyway.
There was a problem hiding this comment.
I was recalled what I was trying to say. If a package is recorded as preferTarget then we now it has the path and we don't have to validate it because of how it is done in prefers().
There was a problem hiding this comment.
This explanation is itself a bit confusing, because this is a map from a package to a package, in both directions. What we can say is that the source package of a prefers definitely has the path, and the target package may have it or not. The map contains both the source package and the target package, both as key and as a value, due to the bidirectional tracking.
There was a problem hiding this comment.
Okay, I think I get what you mean now. You are correct, we now the "source package" definitely contains the path and we need to check that the "targets" also contain it to avoid wasted work. We can either take the preferSource and the value in the map, which is what I have done, or we can take the preferTarget and the key.pkg value. Both contain the same information because it is bi-directional.
I have chosen to use only the preferSource(s) but I could have done the same with preferTarget(s). It is equivalent to following the arrows in one direction or in the other.
There was a problem hiding this comment.
This is only about the comment, the action item is to drop it because it is confusing, the logic is fine.
internal/setup/setup.go
Outdated
| // Check for invalid prefer relationships where the package does not have | ||
| // the path. | ||
| for skey, source := range prefers { | ||
| if skey.side != preferSource || skey.pkg == "" { |
There was a problem hiding this comment.
We don't need to test it if pkg == "". This is a sample package, which will necessarily also be present in a well defined relatinoship.
There was a problem hiding this comment.
Isn't this duplicated because it will also be a target? I wanted to prevent processing the same package twice.
There was a problem hiding this comment.
Not sure I get the point still. My comment says "we don't need to process", and your says "but won't we process it twice"...? The suggestion is to not process it.
There was a problem hiding this comment.
Sorry, I'm confused by the comment below. This will probably be solved by the next point in discussion alraedy.
There was a problem hiding this comment.
Let's discuss in the comment below, all I am saying is that the iteration already covers all the packages that need to be tested, the sample one recorded with skey.pkg == "" doesn't have to be tested because it contains the path by construction:
for _, pkg := range r.Packages {
for _, slice := range pkg.Slices {
for path, info := range slice.Contents {
...
// Sample package that requires this path to be in a prefer relationship.
prefers[preferKey{preferSource, path, ""}] = pkg.Name
}
}
}
internal/setup/setup.go
Outdated
| if oldInfo.Kind == GlobPath && (newInfo.Kind == GlobPath || newInfo.Kind == CopyPath) { | ||
| if new.Package == old.Package { | ||
| continue | ||
| if !strdist.GlobPath(newPath, oldPath) { |
There was a problem hiding this comment.
Can we please take any refactorings out of this PR so we can just agree and merge the core logic. We can then improve things further as follow ups, and discuss them with a bit more clarity.
internal/setup/setup_test.go
Outdated
| /path: | ||
| `, | ||
| }, | ||
| relerror: `slices mypkg1_myslice and mypkg2_myslice conflict on /\*\* and /path`, |
There was a problem hiding this comment.
Sure, but the case here is a bit different: the only conflict we have between mypkg1 and mypkg2 is /path, for which we actually have an explicit prefer relationship, right? In other words, yes, we do have a conflict, but the one conflict we have was explicit.
So a few of questions:
-
Is this PR adding extra logic to handle this case especially?
-
What would actually happen if this logic was taken out? What would be the behavior?
-
After removing said logic (if any), is there a case that is not clear as the described case and is still an issue?
niemeyer
left a comment
There was a problem hiding this comment.
After sitting on this for a while and re-reading the algorithm multiple times, I can't avoid feeling bad about the direction this is going. We're doing a lot of repeated work, reevaluating the same paths (e.g. A prefers B prefers C, we check A=>B=>C, then B=>C, etc), and storing maps that look like each other (prefers vs. pathToSlice).
We need to clean this up so this logic remains maintainable. I have a suggestion for how to do this. Let's speak next week when you're back.
Remove optimizations from the conflict algorithm in preparation for adding support for prefer.
niemeyer
left a comment
There was a problem hiding this comment.
A couple of simple issues only. Logic looks a lot simpler and nicer now, and will look even simpler with one of the suggestions below.
niemeyer
left a comment
There was a problem hiding this comment.
Thanks, Alberto.
This is the last PR of the series, and as we discussed this took a little while to get here, but the simplification is great, even more when this is being done while adding a new feature that was before quite convoluted. Thanks for keeping it up.