Thanks to visit codestin.com
Credit goes to github.com

Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ jobs:
- name: run integrationTest
uses: ./.github/actions/gradle-command-self-hosted-action
with:
gradle-command: :sdks:java:io:sparkreceiver:2:integrationTest
gradle-command: :sdks:java:io:sparkreceiver:3:integrationTest
arguments: |
--info \
--tests org.apache.beam.sdk.io.sparkreceiver.SparkReceiverIOIT \
Expand Down
5 changes: 3 additions & 2 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,12 +68,13 @@
## New Features / Improvements

* Support custom coders in Reshuffle ([#29908](https://github.com/apache/beam/issues/29908), [#33356](https://github.com/apache/beam/issues/33356)).

* [Java] Upgrade SLF4J to 2.0.16. Update default Spark version to 3.5.0. ([#33574](https://github.com/apache/beam/pull/33574))
* X feature added (Java/Python) ([#X](https://github.com/apache/beam/issues/X)).

## Breaking Changes
* [Python] Reshuffle now correctly respects user-specified type hints, fixing a previous bug where it might use FastPrimitivesCoder wrongly. This change could break pipelines with incorrect type hints in Reshuffle. If you have issues after upgrading, temporarily set update_compatibility_version to a previous Beam version to use the old behavior. The recommended solution is to fix the type hints in your code. ([#33932](https://github.com/apache/beam/pull/33932))

* [Python] Reshuffle now correctly respects user-specified type hints, fixing a previous bug where it might use FastPrimitivesCoder wrongly. This change could break pipelines with incorrect type hints in Reshuffle. If you have issues after upgrading, temporarily set update_compatibility_version to a previous Beam version to use the old behavior. The recommended solution is to fix the type hints in your code. ([#33932](https://github.com/apache/beam/pull/33932))
* [Java] SparkReceiver 2 has been moved to SparkReceiver 3 that supports Spark 3.x. ([#33574](https://github.com/apache/beam/pull/33574))
* X behavior was changed ([#X](https://github.com/apache/beam/issues/X)).

## Deprecations
Expand Down
2 changes: 1 addition & 1 deletion build.gradle.kts
Original file line number Diff line number Diff line change
Expand Up @@ -304,7 +304,7 @@ tasks.register("javaPreCommit") {
dependsOn(":sdks:java:io:contextualtextio:build")
dependsOn(":sdks:java:io:expansion-service:build")
dependsOn(":sdks:java:io:file-based-io-tests:build")
dependsOn(":sdks:java:io:sparkreceiver:2:build")
dependsOn(":sdks:java:io:sparkreceiver:3:build")
dependsOn(":sdks:java:io:synthetic:build")
dependsOn(":sdks:java:io:xml:build")
dependsOn(":sdks:java:javadoc:allJavadoc")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -636,12 +636,12 @@ class BeamModulePlugin implements Plugin<Project> {
def quickcheck_version = "1.0"
def sbe_tool_version = "1.25.1"
def singlestore_jdbc_version = "1.1.4"
def slf4j_version = "1.7.30"
def slf4j_version = "2.0.16"
def snakeyaml_engine_version = "2.6"
def snakeyaml_version = "2.2"
def solace_version = "10.21.0"
def spark2_version = "2.4.8"
def spark3_version = "3.2.2"
def spark3_version = "3.5.0"
def spotbugs_version = "4.0.6"
def testcontainers_version = "1.19.7"
// [bomupgrader] determined by: org.apache.arrow:arrow-memory-core, consistent with: google_cloud_platform_libraries_bom
Expand Down
4 changes: 2 additions & 2 deletions runners/google-cloud-dataflow-java/worker/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ applyJavaNature(
// TODO(https://github.com/apache/beam/issues/19114): Move DataflowRunnerHarness class under org.apache.beam.runners.dataflow.worker namespace
"com/google/cloud/dataflow/worker/DataflowRunnerHarness.class",
// Allow slf4j implementation worker for logging during pipeline execution
"org/slf4j/impl/**"
"org/slf4j/jul/**"
],
generatedClassPatterns: [
/^org\.apache\.beam\.runners\.dataflow\.worker\.windmill.*/,
Expand Down Expand Up @@ -240,7 +240,7 @@ project.task('validateShadedJarContainsSlf4jJdk14', dependsOn: 'shadowJar') {
doLast {
project.configurations.shadow.artifacts.files.each {
FileTree slf4jImpl = project.zipTree(it).matching {
include "org/slf4j/impl/JDK14LoggerAdapter.class"
include "org/slf4j/jul/JDK14LoggerAdapter.class"
}
outFile.text = slf4jImpl.files
if (slf4jImpl.files.isEmpty()) {
Expand Down
20 changes: 20 additions & 0 deletions runners/spark/3/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,26 @@ sparkVersions.each { kv ->
}

dependencies {
// Spark versions prior to 3.4.0 are compiled against SLF4J 1.x. The
// `org.apache.spark.internal.Logging.isLog4j12()` function references an
// SLF4J 1.x binding class (org.slf4j.impl.StaticLoggerBinder) which is
// no longer available in SLF4J 2.x. This results in a
// `java.lang.NoClassDefFoundError`.
//
// The workaround is to provide an SLF4J 1.x binding module out of group
// `org.slf4j` to resolve the issue.
// Module `org.apache.logging.log4j:log4j-slf4j-impl` is an example that
// provides a compatible SLF4J 1.x binding regardless SLF4J upgrade.
// Binding/provider modules under group `org.slf4j` (e.g.,
// slf4j-simple, slf4j-reload4j) get upgraded as a new SLF4J version is in
// use, and therefore do not contain the 1.x binding classes.
//
// Notice that Spark 3.1.x uses `ch.qos.logback:logback-classic` and is
// unaffected by the SLF4J upgrade. Spark 3.3.x already uses
// `log4j-slf4j-impl` so it is also unaffected.
if ("$kv.key" >= "320" && "$kv.key" <= "324") {
"sparkVersion$kv.key" library.java.log4j2_slf4j_impl
}
spark.components.each { component -> "sparkVersion$kv.key" "$component:$kv.value" }
}

Expand Down
8 changes: 8 additions & 0 deletions runners/spark/spark_runner.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,10 @@ dependencies {
spark.components.each { component ->
provided "$component:$spark_version"
}
if ("$spark_version" >= "3.5.0") {
implementation "org.apache.spark:spark-common-utils_$spark_scala_version:$spark_version"
implementation "org.apache.spark:spark-sql-api_$spark_scala_version:$spark_version"
}
permitUnusedDeclared "org.apache.spark:spark-network-common_$spark_scala_version:$spark_version"
implementation "io.dropwizard.metrics:metrics-core:4.1.1" // version used by Spark 3.1
compileOnly "org.scala-lang:scala-library:2.12.15"
Expand All @@ -202,6 +206,10 @@ dependencies {
testImplementation library.java.mockito_core
testImplementation "org.assertj:assertj-core:3.11.1"
testImplementation "org.apache.zookeeper:zookeeper:3.4.11"
if ("$spark_version" >= "3.5.0") {
testImplementation "org.apache.spark:spark-common-utils_$spark_scala_version:$spark_version"
testImplementation "org.apache.spark:spark-sql-api_$spark_scala_version:$spark_version"
}
validatesRunner project(path: ":sdks:java:core", configuration: "shadowTest")
validatesRunner project(path: ":runners:core-java", configuration: "testRuntimeMigration")
validatesRunner project(":sdks:java:io:hadoop-format")
Expand Down
15 changes: 3 additions & 12 deletions sdks/java/extensions/arrow/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -24,19 +24,10 @@ description = "Apache Beam :: SDKs :: Java :: Extensions :: Arrow"
dependencies {
implementation library.java.vendored_guava_32_1_2_jre
implementation project(path: ":sdks:java:core", configuration: "shadow")
implementation(library.java.arrow_vector) {
// Arrow 15 has compile dependency of slf4j 2.x where Beam does not support
exclude group: 'org.slf4j', module: 'slf4j-api'
}
implementation(library.java.arrow_memory_core) {
// Arrow 15 has compile dependency of slf4j 2.x where Beam does not support
exclude group: 'org.slf4j', module: 'slf4j-api'
}
implementation(library.java.arrow_vector)
implementation(library.java.arrow_memory_core)
implementation library.java.joda_time
testImplementation(library.java.arrow_memory_netty) {
// Arrow 15 has compile dependency of slf4j 2.x where Beam does not support
exclude group: 'org.slf4j', module: 'slf4j-api'
}
testImplementation(library.java.arrow_memory_netty)
testImplementation library.java.junit
testImplementation library.java.hamcrest
testRuntimeOnly library.java.slf4j_simple
Expand Down
16 changes: 13 additions & 3 deletions sdks/java/io/cdap/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,11 @@ dependencies {
implementation library.java.cdap_etl_api
implementation library.java.cdap_etl_api_spark
implementation library.java.cdap_hydrator_common
implementation library.java.cdap_plugin_hubspot
implementation (library.java.cdap_plugin_hubspot) {
// Excluding the module for scala 2.11, because Spark 3.x uses scala
// 2.12 instead.
exclude group: "com.fasterxml.jackson.module", module: "jackson-module-scala_2.11"
}
implementation library.java.cdap_plugin_salesforce
implementation library.java.cdap_plugin_service_now
implementation library.java.cdap_plugin_zendesk
Expand All @@ -56,11 +60,17 @@ dependencies {
implementation library.java.jackson_core
implementation library.java.jackson_databind
implementation library.java.slf4j_api
implementation library.java.spark_streaming
implementation (library.java.spark3_streaming) {
// Excluding `org.slf4j:jul-to-slf4j` which was introduced as a
// transitive dependency in Spark 3.5.0 (particularly from
// spark-common-utils_2.12) and would cause stack overflow together with
// `org.slf4j:slf4j-jdk14`.
exclude group: "org.slf4j", module: "jul-to-slf4j"
}
implementation library.java.tephra
implementation library.java.vendored_guava_32_1_2_jre
implementation project(path: ":sdks:java:core", configuration: "shadow")
implementation project(":sdks:java:io:sparkreceiver:2")
implementation project(":sdks:java:io:sparkreceiver:3")
implementation project(":sdks:java:io:hadoop-format")
testImplementation library.java.cdap_plugin_service_now
testImplementation library.java.cdap_etl_api
Expand Down
15 changes: 3 additions & 12 deletions sdks/java/io/google-cloud-platform/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -138,22 +138,13 @@ dependencies {
implementation library.java.slf4j_api
implementation library.java.vendored_grpc_1_69_0
implementation library.java.vendored_guava_32_1_2_jre
implementation(library.java.arrow_memory_core) {
// Arrow 15 has compile dependency of slf4j 2.x where Beam does not support
exclude group: 'org.slf4j', module: 'slf4j-api'
}
implementation(library.java.arrow_vector) {
// Arrow 15 has compile dependency of slf4j 2.x where Beam does not support
exclude group: 'org.slf4j', module: 'slf4j-api'
}
implementation library.java.arrow_memory_core
implementation library.java.arrow_vector

implementation 'com.google.http-client:google-http-client-gson:1.41.2'
implementation "org.threeten:threetenbp:1.4.4"

testImplementation(library.java.arrow_memory_netty) {
// Arrow 15 has compile dependency of slf4j 2.x where Beam does not support
exclude group: 'org.slf4j', module: 'slf4j-api'
}
testImplementation library.java.arrow_memory_netty
testImplementation project(path: ":sdks:java:core", configuration: "shadowTest")
testImplementation project(path: ":sdks:java:extensions:avro", configuration: "testRuntimeMigration")
testImplementation project(path: ":sdks:java:extensions:google-cloud-platform-core", configuration: "testRuntimeMigration")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,11 @@
-->
# SparkReceiverIO

SparkReceiverIO provides I/O transforms to read messages from an [Apache Spark Receiver](https://spark.apache.org/docs/2.4.0/streaming-custom-receivers.html) `org.apache.spark.streaming.receiver.Receiver` as an unbounded source.
SparkReceiverIO provides I/O transforms to read messages from an [Apache Spark Receiver](https://spark.apache.org/docs/3.5.0/streaming-custom-receivers.html) `org.apache.spark.streaming.receiver.Receiver` as an unbounded source.

## Prerequistes

SparkReceiverIO supports [Spark Receivers](https://spark.apache.org/docs/2.4.0/streaming-custom-receivers.html) (Spark version 2.4).
SparkReceiverIO supports [Spark Receivers](https://spark.apache.org/docs/3.5.0/streaming-custom-receivers.html) (Spark version 3.x, tested on Spark version 3.5.0).
1. Corresponding Spark Receiver should implement [HasOffset](https://github.com/apache/beam/blob/master/sdks/java/io/sparkreceiver/src/main/java/org/apache/beam/sdk/io/sparkreceiver/HasOffset.java) interface.
2. Records should have the numeric field that represents record offset. *Example:* `RecordId` field for Salesforce and `vid` field for Hubspot Receivers.
For more details please see [GetOffsetUtils](https://github.com/apache/beam/tree/master/examples/java/cdap/src/main/java/org/apache/beam/examples/complete/cdap/utils/GetOffsetUtils.java) class from CDAP plugins examples.
Expand Down Expand Up @@ -53,7 +53,7 @@ To learn more, please check out CDAP Streaming plugins [complete examples](https

## Dependencies

To use SparkReceiverIO, add a dependency on `beam-sdks-java-io-sparkreceiver`.
To use SparkReceiverIO, add a dependency on `beam-sdks-java-io-sparkreceiver-3`.

```maven
<dependency>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -43,8 +43,8 @@ dependencies {
implementation library.java.commons_lang3
implementation library.java.joda_time
implementation library.java.slf4j_api
implementation library.java.spark_streaming
implementation library.java.spark_core
implementation library.java.spark3_streaming
implementation library.java.spark3_core
implementation library.java.vendored_guava_32_1_2_jre
implementation project(path: ":sdks:java:core", configuration: "shadow")
compileOnly "org.scala-lang:scala-library:2.11.12"
Expand Down
2 changes: 1 addition & 1 deletion settings.gradle.kts
Original file line number Diff line number Diff line change
Expand Up @@ -252,7 +252,7 @@ include(":sdks:java:io:rabbitmq")
include(":sdks:java:io:redis")
include(":sdks:java:io:rrio")
include(":sdks:java:io:solr")
include(":sdks:java:io:sparkreceiver:2")
include(":sdks:java:io:sparkreceiver:3")
include(":sdks:java:io:snowflake")
include(":sdks:java:io:snowflake:expansion-service")
include(":sdks:java:io:splunk")
Expand Down
Loading