Changes the output label of CC to min of original ID#320
Changes the output label of CC to min of original ID#320SemyonSinchenko merged 3 commits intographframes:masterfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## master #320 +/- ##
==========================================
+ Coverage 91.17% 91.23% +0.05%
==========================================
Files 18 18
Lines 873 878 +5
Branches 73 73
==========================================
+ Hits 796 801 +5
Misses 77 77
Continue to review full report at Codecov.
|
|
@mengxr @felixcheung any thoughts on this one. |
|
This is very helpful, any updates? |
|
Hey @rjurney, Not sure of your status or involvement in this project but is there a chance you could push this through? 🤞 AFAICT this would unlock consistent |
|
@followingell I like this, but my machine is presently in an unhappy state as it relates to this project as I dip my toes in Spark proper waters. What testing have you done? |
Unfortunately none. I don't have access at the moment to a machine where I could test this without issues. |
|
Okay, give me a few days and I’ll test it. |
|
We have a lot of pending PRs related to Connected Components. I'm going to push this one forward after merging #552 |
|
Sounds good. I like this PR. |
|
I will try to resolve conflicts. |
|
@architch Thanks for resolving conflicts. You can use an existing pre-commit hook to fix style (or just run |
|
Thanks for the contribution, @architch ! |
|
@architch I want to notify you that after some discussions we made a decision to keep as the default the behavior before your PR: #632 To use min vertex label as component label you will need to explicitly modify the parameter. You can use the config Sorry about it, but do a breaking change will require too much work from downstream projects. I think for users it won't be hard to set config or argument to |
|
This seems like a happy medium :) |
|
Thanks @SemyonSinchenko for the info. It makes sense to make it configurable. |
This PR changes the label to minimum of the vertex ID belonging to a component.
The assigned label of output of Connected Component is currently a random number which is not consistent. If some delta edges are added to the graph, the output label of unaffected components also changes.
Output of friends graph example:
