-
Notifications
You must be signed in to change notification settings - Fork 263
feat: SparkConnect support #506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
rjurney
merged 32 commits into
graphframes:master
from
SemyonSinchenko:447-spark-connect
Mar 17, 2025
Merged
Changes from all commits
Commits
Show all changes
32 commits
Select commit
Hold shift + click to select a range
d1972fc
Merge remote-tracking branch 'refs/remotes/origin/master'
SemyonSinchenko c158815
wip
SemyonSinchenko ea11df6
wip
SemyonSinchenko 0ddd5bd
wip
SemyonSinchenko da7eccc
wip
SemyonSinchenko fb784a3
The first working version
SemyonSinchenko 9f8905f
Merge remote-tracking branch 'refs/remotes/graphframes/master'
SemyonSinchenko d58ed2a
Merge remote-tracking branch 'refs/remotes/graphframes/master'
SemyonSinchenko 20f7575
WIP
SemyonSinchenko eee7b7b
Working version?
SemyonSinchenko 130b12e
Merge remote-tracking branch 'refs/remotes/graphframes/master'
SemyonSinchenko 7e325aa
Fix tests
SemyonSinchenko c47a57e
Fix tests
SemyonSinchenko fc8ebae
Fix CI typo
SemyonSinchenko f13c754
Fix typo in CI
SemyonSinchenko a21b5aa
Fix wget's verbose + GHA bug
SemyonSinchenko f4c91d6
Stop connect server
SemyonSinchenko e4d75f7
An attempt to fix a bug in GHA with a non-stopping tests
SemyonSinchenko 0950cfd
Maybe https://github.com/grpc/grpc/issues/38290?
SemyonSinchenko 8cb430c
Fix broken stop-cript
SemyonSinchenko 19c1934
Ignore errors in clean-up
SemyonSinchenko 1eef323
Verbosity in ci tests
SemyonSinchenko 7fd1f23
Merge main
SemyonSinchenko cc60bcb
Typo
SemyonSinchenko 5528d65
Fix merge-artifacts
SemyonSinchenko f88e19a
Fix merge artifacts
SemyonSinchenko 59897fb
Apply pre-commit rules
SemyonSinchenko 97054b0
Add the missing method
SemyonSinchenko 90a326f
Restore accidently deleted part of CI
SemyonSinchenko c8bcf43
Typo
SemyonSinchenko 9d7f714
Fixes from comments
SemyonSinchenko 5a91659
Pin the pyspark version <4.0 and re-generate lock
SemyonSinchenko File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,12 @@ | ||
| version: v2 | ||
| managed: | ||
| enabled: true | ||
|
|
||
| plugins: | ||
| # Python API | ||
| - remote: buf.build/grpc/python:v1.64.2 | ||
| out: python/graphframes/connect/proto | ||
| - remote: buf.build/protocolbuffers/python:v27.1 | ||
| out: python/graphframes/connect/proto | ||
| - remote: buf.build/protocolbuffers/pyi | ||
| out: python/graphframes/connect/proto |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,3 @@ | ||
| version: v2 | ||
| modules: | ||
| - path: graphframes-connect/src/main/protobuf |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
137 changes: 137 additions & 0 deletions
137
graphframes-connect/src/main/protobuf/graphframes.proto
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,137 @@ | ||
| syntax = 'proto3'; | ||
|
|
||
| package org.graphframes.connect.proto; | ||
rjurney marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| option java_multiple_files = true; | ||
| option java_package = "org.graphframes.connect.proto"; | ||
| option java_generate_equals_and_hash = true; | ||
| option optimize_for=SPEED; | ||
|
|
||
|
|
||
| message GraphFramesAPI { | ||
| bytes vertices = 1; | ||
| bytes edges = 2; | ||
| oneof method { | ||
| AggregateMessages aggregate_messages = 3; | ||
| BFS bfs = 4; | ||
| ConnectedComponents connected_components = 5; | ||
| DropIsolatedVertices drop_isolated_vertices = 6; | ||
| FilterEdges filter_edges = 7; | ||
| FilterVertices filter_vertices = 8; | ||
| Find find = 9; | ||
| LabelPropagation label_propagation = 10; | ||
| PageRank page_rank = 11; | ||
| ParallelPersonalizedPageRank parallel_personalized_page_rank = 12; | ||
| PowerIterationClustering power_iteration_clustering = 13; | ||
| Pregel pregel = 14; | ||
| ShortestPaths shortest_paths = 15; | ||
| StronglyConnectedComponents strongly_connected_components = 16; | ||
| SVDPlusPlus svd_plus_plus = 17; | ||
| TriangleCount triangle_count = 18; | ||
| Triplets triplets = 19; | ||
| } | ||
| } | ||
|
|
||
| message ColumnOrExpression { | ||
| oneof col_or_expr { | ||
| bytes col = 1; | ||
| string expr = 2; | ||
| } | ||
| } | ||
|
|
||
| message StringOrLongID { | ||
| oneof id { | ||
| int64 long_id = 1; | ||
| string string_id = 2; | ||
| } | ||
| } | ||
|
|
||
| message AggregateMessages { | ||
| ColumnOrExpression agg_col = 1; | ||
| optional ColumnOrExpression send_to_src = 2; | ||
| optional ColumnOrExpression send_to_dst = 3; | ||
| } | ||
|
|
||
| message BFS { | ||
| ColumnOrExpression from_expr = 1; | ||
| ColumnOrExpression to_expr = 2; | ||
| ColumnOrExpression edge_filter = 3; | ||
| int32 max_path_length = 4; | ||
| } | ||
|
|
||
| message ConnectedComponents { | ||
| string algorithm = 1; | ||
| int32 checkpoint_interval = 2; | ||
| int32 broadcast_threshold = 3; | ||
| } | ||
|
|
||
| message DropIsolatedVertices {} | ||
|
|
||
| message FilterEdges { | ||
| ColumnOrExpression condition = 1; | ||
| } | ||
|
|
||
| message FilterVertices { | ||
| ColumnOrExpression condition = 2; | ||
| } | ||
|
|
||
| message Find { | ||
| string pattern = 1; | ||
| } | ||
|
|
||
| message LabelPropagation { | ||
| int32 max_iter = 1; | ||
| } | ||
|
|
||
| message PageRank { | ||
| double reset_probability = 1; | ||
| optional StringOrLongID source_id = 2; | ||
| optional int32 max_iter = 3; | ||
| optional double tol = 4; | ||
| } | ||
|
|
||
| message ParallelPersonalizedPageRank { | ||
| double reset_probability = 1; | ||
| repeated StringOrLongID source_ids = 2; | ||
| int32 max_iter = 3; | ||
| } | ||
|
|
||
| message PowerIterationClustering { | ||
| int32 k = 1; | ||
| int32 max_iter = 2; | ||
| optional string weight_col = 3; | ||
| } | ||
|
|
||
| message Pregel { | ||
| ColumnOrExpression agg_msgs = 1; | ||
| repeated ColumnOrExpression send_msg_to_dst = 2; | ||
| repeated ColumnOrExpression send_msg_to_src = 3; | ||
| int32 checkpoint_interval = 4; | ||
| int32 max_iter = 5; | ||
| string additional_col_name = 6; | ||
| ColumnOrExpression additional_col_initial = 7; | ||
| ColumnOrExpression additional_col_upd = 8; | ||
| } | ||
|
|
||
| message ShortestPaths { | ||
| repeated StringOrLongID landmarks = 1; | ||
| } | ||
|
|
||
| message StronglyConnectedComponents { | ||
| int32 max_iter = 1; | ||
| } | ||
|
|
||
| message SVDPlusPlus { | ||
| int32 rank = 1; | ||
| int32 max_iter = 2; | ||
| double min_value = 3; | ||
| double max_value = 4; | ||
| double gamma1 = 5; | ||
| double gamma2 = 6; | ||
| double gamma6 = 7; | ||
| double gamma7 = 8; | ||
| } | ||
|
|
||
| message TriangleCount {} | ||
|
|
||
| message Triplets {} | ||
rjurney marked this conversation as resolved.
Show resolved
Hide resolved
|
||
24 changes: 24 additions & 0 deletions
24
graphframes-connect/src/main/scala/org/apache/spark/sql/graphframes/GraphFramesConnect.scala
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,24 @@ | ||
| package org.apache.spark.sql.graphframes | ||
|
|
||
| import org.graphframes.connect.proto.GraphFramesAPI | ||
|
|
||
| import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan | ||
| import org.apache.spark.sql.connect.planner.SparkConnectPlanner | ||
| import org.apache.spark.sql.connect.plugin.RelationPlugin | ||
|
|
||
| import com.google.protobuf | ||
|
|
||
| class GraphFramesConnect extends RelationPlugin { | ||
rjurney marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| override def transform( | ||
| relation: protobuf.Any, | ||
| planner: SparkConnectPlanner): Option[LogicalPlan] = { | ||
| if (relation.is(classOf[GraphFramesAPI])) { | ||
| val protoCall = relation.unpack(classOf[GraphFramesAPI]) | ||
| // Because the plugins API is changed in spark 4.0 it makes sense to separate plugin impl from the parsing logic | ||
| val result = GraphFramesConnectUtils.parseAPICall(protoCall, planner) | ||
| Some(result.logicalPlan) | ||
| } else { | ||
| None | ||
| } | ||
| } | ||
| } | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.