Pregel execution resulting in duplicate ID rows #668
-
|
Howdy folks - I've just started using graphframes recently for a project and in general I've been quite pleased with it. However, I ran into a problem I can't find any documentation on, so I'm curious if y'all can help. I have initialized a graph with (made up numbers) 10000 vertices and 10000 edges, and verified that with count operations. I defined a Pregel algorithm that runs for 25 iterations. I was expecting to see the output of This is on EMR Serverless with graphframes 0.8.4 and spark 3.5.4. I don't run into this behavior with local unit tests. Hopefully this is an appropriate spot for this question! I appreciate all of the work on this project. |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 5 replies
-
|
@SauronShepherd does this bug interest you? |
Beta Was this translation helpful? Give feedback.
-
|
@confusedamazonian can you share your Pregel code? Just curious if some side effect could create a node. What is the algorithm? |
Beta Was this translation helpful? Give feedback.
-
|
@confusedamazonian Hello! Can you try with the latest GF (0.9.2)? And if the bug is there, could you give us a minimal reproducing example? Thanks! |
Beta Was this translation helpful? Give feedback.
-
|
I have to eat crow - my restriction of no null or duplicate values in the ID column was insufficient, so a small number (~10) snuck in and that appears to be responsible for this behavior. Removing those duplicates results in the number of rows output by pregel matching the number of rows in the vertices DF. Sorry to waste everyone's time, glad to see it's a me issue and not a software issue at least. |
Beta Was this translation helpful? Give feedback.
-
|
That's okay! We do provide support to isolate problems :) |
Beta Was this translation helpful? Give feedback.
I have to eat crow - my restriction of no null or duplicate values in the ID column was insufficient, so a small number (~10) snuck in and that appears to be responsible for this behavior. Removing those duplicates results in the number of rows output by pregel matching the number of rows in the vertices DF.
Sorry to waste everyone's time, glad to see it's a me issue and not a software issue at least.