Codestin Search App

SemyonSinchenko · 2025-06-11T11:13:35Z

What changes were proposed in this pull request?

Everything related to the Spark 4.0.x support.

Why are the changes needed?

Close #576

Kimahriman · 2025-06-11T15:43:42Z

src/main/scala/org/graphframes/lib/BFS.scala

-    new Column(expr.expr.transform { case UnresolvedAttribute(nameParts) =>
-      UnresolvedAttribute(colName +: nameParts)
-    })
+  private def applyExprToCol(c: Column, colName: String, fieldNames: Seq[String]): Column = {


I spent some time trying to get the 4.0 support working as well, I think there will need to be major version-specific shims for this (and a few other connect related helper functions). What I got working for Spark 4 is

def applyExprToCol(spark: SparkSession, expr: Column, colName: String): Column = { val converted = spark.asInstanceOf[ClassicSparkSession].converter(expr.node) ExpressionUtils.column(converted.transform { case UnresolvedAttribute(nameParts) => UnresolvedAttribute(colName +: nameParts) }) }

Only downside is it's not automatically compatible with Connect as well, but not sure how else to go about it. It'd maybe be nice if ColumnNode had the same tree transform helpers then it could be connect-compatible

Don't worry about connect! It will work via Plugin anyway, so we do not need to even care about this. And thanks for the snippet!!!

I just found org.apache.spark.sql.classic.RichColumn that provides expr: Expression, so maybe there is no need to add a shim...

Well the shim would be for also maintaining support for Spark 3 as well

src/main/scala/org/graphframes/lib/ShortestPaths.scala

SemyonSinchenko · 2025-06-12T07:40:39Z

I'm really surprised that it works!

SemyonSinchenko · 2025-06-12T07:42:31Z

@SauronShepherd @james-willis @rjurney Hi! Could you take a look please?

Note: the target branch is not main but spark-4x

python/dev/build_jar.py

james-willis · 2025-06-21T00:03:15Z

I am a little worried that we are not targeting a single branch that supports both spark 3 and 4.

We will probably want to support spark 3 for some time and this might increase the maintainance burden of brining patches and features to both spark 3 and 4.

In sedona we have directories that have copies of the same code for different versions of spark: https://github.com/apache/sedona/tree/master/spark

not sure if that pattern would succeed here.

SemyonSinchenko · 2025-06-21T06:01:16Z

I am a little worried that we are not targeting a single branch that supports both spark 3 and 4.

We will probably want to support spark 3 for some time and this might increase the maintainance burden of brining patches and features to both spark 3 and 4.

In sedona we have directories that have copies of the same code for different versions of spark: https://github.com/apache/sedona/tree/master/spark

not sure if that pattern would succeed here.

That is the big question. In GraphFrames there are:

core
spark connect
pyspark classic
pyspark connect

In the nearest future I'm going to add also property graphs support as a subproject and I hope to have i/o subproject too (for the support of import/export from too GraphAr, Neo4j, Nebula, etc.)

And all of these projects needs to be copied to support both 3.5.x and 4.0.x from the same branch. That is why I'm thinking about relying on branches instead of relying on multiple subprojects for different versions of spark.

I do not think that shims and/or reflection magic will work here too just because from 3.5.x to 4.0.x there are breaking changes not only in accesses and methods...

Overall: I'm a terrible release manager, I do not like CI/CD as well as build systems topic, I'm not very experienced in best practices about supporting multiple versions, etc. So, if someone experienced can draw a diagram or something like this about what would be the best way it will be nice!

Kimahriman · 2025-06-21T06:22:50Z

I have a mostly working version of using shims to support 3.x and 4.x I can try to clean up and make a PR for next week

SauronShepherd · 2025-06-21T07:23:13Z

The goal is to support Spark 4 or actually to take advantage and leverage some of the new features of this new version? I have some ideas in mind that may improve some algorithms a lot (not sure about that, but I think it's worith trying it).

Sorry for being away for so long, but I’m back and quite interested in this ticket. I can start helping out next week.

Kimahriman · 2025-06-23T18:21:53Z

Made the PR with supporting multiple versions: #608

SemyonSinchenko · 2025-06-25T05:47:10Z

Closed in favor of #608

WIP

3e260c7

SemyonSinchenko self-assigned this Jun 11, 2025

SemyonSinchenko added the scala label Jun 11, 2025

SemyonSinchenko added 5 commits June 11, 2025 15:38

WIP

8b3b2b2

WIP

887b901

WIP

0bbd3c6

WIP

56ef10c

WIP

3889489

SemyonSinchenko mentioned this pull request Jun 11, 2025

[EPIC] Spark 4.0 support #576

Closed

3 tasks

Kimahriman reviewed Jun 11, 2025

View reviewed changes

Fix SP with GraphX

7dfeef5

SemyonSinchenko commented Jun 11, 2025

View reviewed changes

src/main/scala/org/graphframes/lib/ShortestPaths.scala Outdated Show resolved Hide resolved

SemyonSinchenko added 3 commits June 11, 2025 18:24

fix BFS

fe5f65b

update connect plugin && pyspark-connect client

68318ed

fix connect plugin

38bff29

SemyonSinchenko changed the title ~~[DO-NOT-MERGE] feat: Spark 4.0.x support~~ feat: Spark 4.0.x support Jun 12, 2025

SemyonSinchenko marked this pull request as ready for review June 12, 2025 07:40

SemyonSinchenko requested a review from rjurney June 12, 2025 07:43

SemyonSinchenko linked an issue Jun 12, 2025 that may be closed by this pull request

[EPIC] Spark 4.0 support #576

Closed

3 tasks

SemyonSinchenko added pyspark-classic GraphFrames on PySpark Classic pyspark-connect GraphFrames on PySpark Connect labels Jun 12, 2025

SemyonSinchenko mentioned this pull request Jun 12, 2025

java.lang.ClassCastException on shortestPaths calculation #253

Closed

james-willis reviewed Jun 20, 2025

View reviewed changes

python/dev/build_jar.py Outdated Show resolved Hide resolved

fixes from comments

9058f39

SemyonSinchenko requested a review from james-willis June 21, 2025 06:14

SemyonSinchenko closed this Jun 25, 2025

SemyonSinchenko deleted the spark-4 branch July 19, 2025 11:56

Conversation

SemyonSinchenko commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Uh oh!

Kimahriman Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

SemyonSinchenko Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

SemyonSinchenko Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

Kimahriman Jun 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SemyonSinchenko commented Jun 12, 2025

Uh oh!

SemyonSinchenko commented Jun 12, 2025

Uh oh!

Uh oh!

james-willis commented Jun 21, 2025

Uh oh!

SemyonSinchenko commented Jun 21, 2025

Uh oh!

Kimahriman commented Jun 21, 2025

Uh oh!

SauronShepherd commented Jun 21, 2025

Uh oh!

Kimahriman commented Jun 23, 2025

Uh oh!

SemyonSinchenko commented Jun 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

SemyonSinchenko commented Jun 11, 2025 •

edited

Loading