Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@joelrobin18
Copy link
Contributor

What changes were proposed in this pull request?

This PR adds a new API to Pregel for optimizing triplet memory consumption:

// Scala API
pregel
  .requiredSrcColumns("col1", "col2")
  .requiredDstColumns("col3")
  .run()

// Python API (Classic & Connect)
pregel \
  .requiredSrcColumns("col1", "col2") \
  .requiredDstColumns("col3") \
  .run()

Why are the changes needed?

When constructing triplets in Pregel, we currently select all source and destination vertex columns (*), creating large intermediate DataFrames in memory. This is especially problematic for algorithms with large per-vertex state (e.g., cycle detection with stored sequences, future random walks).

This optimization allows users to explicitly specify which columns are needed, significantly reducing memory pressure for large-scale graph processing.

@joelrobin18 joelrobin18 force-pushed the feat/pregel-required-columns branch from a82cd81 to be302db Compare December 27, 2025 17:54
Signed-off-by: joelrobin18 <[email protected]>
@codecov-commenter
Copy link

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 75.00000% with 6 lines in your changes missing coverage. Please review.
βœ… Project coverage is 84.12%. Comparing base (e24f15e) to head (c3fd809).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
...park/sql/graphframes/GraphFramesConnectUtils.scala 0.00% 6 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #763      +/-   ##
==========================================
- Coverage   84.49%   84.12%   -0.38%     
==========================================
  Files          66       66              
  Lines        3179     3262      +83     
  Branches      387      376      -11     
==========================================
+ Hits         2686     2744      +58     
- Misses        493      518      +25     

β˜” View full report in Codecov by Sentry.
πŸ“’ Have feedback on the report? Share it here.

πŸš€ New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Collaborator

@SemyonSinchenko SemyonSinchenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @joelrobin18 ! Looks very cool! I left two comments and I would like also to ask you to add a couple of words about the new API to the https://github.com/graphframes/graphframes/blob/main/docs/src/04-user-guide/10-pregel.md

Thanks!

Signed-off-by: joelrobin18 <[email protected]>
Copy link
Collaborator

@SemyonSinchenko SemyonSinchenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Nice work, thanks a lot @joelrobin18 !

@SemyonSinchenko SemyonSinchenko linked an issue Dec 30, 2025 that may be closed by this pull request
7 tasks
@SemyonSinchenko SemyonSinchenko merged commit cb1a1d2 into graphframes:main Dec 30, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: selections for Pregel on the step of triplets generation

3 participants