chore: Cleanup assembly and shading#617
Merged
SemyonSinchenko merged 7 commits intographframes:masterfrom Jul 2, 2025
Merged
Conversation
Kimahriman
commented
Jul 2, 2025
Comment on lines
-119
to
-128
| // Assembly settings | ||
| assembly / test := {}, // No tests in assembly | ||
| assemblyPackageScala / assembleArtifact := false, | ||
| assembly / assemblyMergeStrategy := { | ||
| case PathList("META-INF", xs @ _*) => MergeStrategy.discard | ||
| case x if x.endsWith("module-info.class") => MergeStrategy.discard | ||
| case x => | ||
| val oldStrategy = (assembly / assemblyMergeStrategy).value | ||
| oldStrategy(x) | ||
| }, |
Contributor
Author
There was a problem hiding this comment.
I removed this because I don't think there's any need to run assembly on the root project? Unless you want to keep the ability to manual build a fat JAR
Contributor
Author
|
POM from <?xml version='1.0' encoding='UTF-8'?>
<project xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://maven.apache.org/POM/4.0.0">
<modelVersion>4.0.0</modelVersion>
<groupId>org.graphframes</groupId>
<artifactId>graphframes-connect-spark4_2.13</artifactId>
<packaging>jar</packaging>
<description>graphframes-connect</description>
<url>https://graphframes.io/</url>
<version>0.9.0-SNAPSHOT</version>
<licenses>
<license>
<name>Apache-2.0</name>
<url>https://opensource.org/licenses/Apache-2.0</url>
<distribution>repo</distribution>
</license>
</licenses>
<name>graphframes-connect</name>
<organization>
<name>org.graphframes</name>
<url>https://graphframes.io/</url>
</organization>
<scm>
<url>https://github.com/graphframes/graphframes</url>
<connection>scm:[email protected]:graphframes/graphframes.git</connection>
</scm>
<developers>
<developer>
<id>rjurney</id>
<name>Russell Jurney</name>
<url>https://github.com/rjurney</url>
<email>[email protected]</email>
</developer>
<developer>
<id>SemyonSinchenko</id>
<name>Sem</name>
<url>https://github.com/SemyonSinchenko</url>
<email>[email protected]</email>
</developer>
</developers>
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.13.12</version>
</dependency>
<dependency>
<groupId>org.graphframes</groupId>
<artifactId>graphframes-spark4_2.13</artifactId>
<version>0.9.0-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-graphx_2.13</artifactId>
<version>4.0.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.13</artifactId>
<version>4.0.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.13</artifactId>
<version>4.0.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>2.0.16</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.scalatest</groupId>
<artifactId>scalatest_2.13</artifactId>
<version>3.0.8</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.github.zafarkhaja</groupId>
<artifactId>java-semver</artifactId>
<version>0.10.2</version>
<scope>test</scope>
</dependency>
</dependencies>
</project> |
Contributor
Author
|
Collaborator
Contributor
Author
|
Oh yeah was able to get that working by just excluding all JARs. It's annoying Spark shades this, as normally you could just directly use |
Contributor
Author
|
Contributor
Author
|
Actually figured out how to simplify even more, don't need the extra project |
Contributor
Author
<?xml version='1.0' encoding='UTF-8'?>
<project xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://maven.apache.org/POM/4.0.0">
<modelVersion>4.0.0</modelVersion>
<groupId>org.graphframes</groupId>
<artifactId>graphframes-connect-spark4_2.13</artifactId>
<packaging>jar</packaging>
<description>graphframes-connect</description>
<url>https://graphframes.io/</url>
<version>0.9.0-SNAPSHOT</version>
<licenses>
<license>
<name>Apache-2.0</name>
<url>https://opensource.org/licenses/Apache-2.0</url>
<distribution>repo</distribution>
</license>
</licenses>
<name>graphframes-connect</name>
<organization>
<name>org.graphframes</name>
<url>https://graphframes.io/</url>
</organization>
<scm>
<url>https://github.com/graphframes/graphframes</url>
<connection>scm:[email protected]:graphframes/graphframes.git</connection>
</scm>
<developers>
<developer>
<id>rjurney</id>
<name>Russell Jurney</name>
<url>https://github.com/rjurney</url>
<email>[email protected]</email>
</developer>
<developer>
<id>SemyonSinchenko</id>
<name>Sem</name>
<url>https://github.com/SemyonSinchenko</url>
<email>[email protected]</email>
</developer>
</developers>
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.13.12</version>
</dependency>
<dependency>
<groupId>org.graphframes</groupId>
<artifactId>graphframes-spark4_2.13</artifactId>
<version>0.9.0-SNAPSHOT</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-graphx_2.13</artifactId>
<version>4.0.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.13</artifactId>
<version>4.0.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.13</artifactId>
<version>4.0.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>2.0.16</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.scalatest</groupId>
<artifactId>scalatest_2.13</artifactId>
<version>3.0.8</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.github.zafarkhaja</groupId>
<artifactId>java-semver</artifactId>
<version>0.10.2</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-connect_2.13</artifactId>
<version>4.0.0</version>
<scope>provided</scope>
</dependency>
</dependencies>
</project> |
SemyonSinchenko
approved these changes
Jul 2, 2025
Collaborator
SemyonSinchenko
left a comment
There was a problem hiding this comment.
Fantastic! Thanks a lot @Kimahriman !!!
SemyonSinchenko
added a commit
to SemyonSinchenko/graphframes
that referenced
this pull request
Jul 2, 2025
SemyonSinchenko
added a commit
that referenced
this pull request
Jul 2, 2025
* **Update Scala CI workflows and build configurations** - Refactor `scala-publish.yml` to clarify release and snapshot publishing conditions. - Adjust `docs.yml` trigger to specifically include the `main` branch. - Remove unused Sonatype import from `build.sbt`. - Enhance developer metadata and maintainers list in `build.sbt`. - Update dependencies and assembly configuration to address shading and exclude non-connect classes for the Uber JAR. - Introduce custom POM post-processing for correct dependency scope adjustments. * Add missing developer email * Specify the scope for protobuf-java Added a post-processing to mark protobuf scope to "provided" because it is a part of Apache Spark itself. * Take everything from #617 * main -> master I always forgot that GF is uses master as a default branch...
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

What changes were proposed in this pull request?
Resolves #614
Since the sbt-assembly plugin is meant for creating fat/uber JARs, it doesn't do anything about modifying POMs for published libraries to take into account the things that are shaded. So this creates an intermediate project for the connect shading, and then a final project with the correct dependencies and shaded JAR for actual publishing.
Why are the changes needed?
Fix connect artifact so only protobuf is shaded.