Affected version
0.5.0
Current and expected behavior
Following https://iceberg.apache.org/docs/latest/getting-started/
Current
Use
deps:
packages:
# - org.apache.hadoop:hadoop-aws:3.3.3
- org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:0.14.1
Driver logs:
:: loading settings :: url = jar:file:/stackable/spark-3.3.0-bin-hadoop3/jars/ivy-2.5.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
Ivy Default Cache set to: /stackable/.ivy2/cache
The jars for the packages stored in: /stackable/.ivy2/jars
org.apache.iceberg#iceberg-spark-runtime-3.3_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-ddd36894-c46d-4d2f-82b2-8a916f718eba;1.0
confs: [default]
found org.apache.iceberg#iceberg-spark-runtime-3.3_2.12;0.14.1 in central
downloading https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-spark-runtime-3.3_2.12/0.14.1/iceberg-spark-runtime-3.3_2.12-0.14.1.jar ...
[SUCCESSFUL ] org.apache.iceberg#iceberg-spark-runtime-3.3_2.12;0.14.1!iceberg-spark-runtime-3.3_2.12.jar (11980ms)
:: resolution report :: resolve 496ms :: artifacts dl 11983ms
:: modules in use:
org.apache.iceberg#iceberg-spark-runtime-3.3_2.12;0.14.1 from central in [default]
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 1 | 1 | 1 | 0 || 1 | 1 |
---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-ddd36894-c46d-4d2f-82b2-8a916f718eba
confs: [default]
1 artifacts copied, 0 already retrieved (29791kB/17ms)
Executors do not pull the dependencies and fail with java.lang.ClassNotFoundException: org.apache.iceberg.spark.source.SparkWrite$WriterFactory (should come with org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:0.14.1)
Expected
drivers and executors pull the dependencies
Possible solution
No response
Additional context
No response
Environment
---
apiVersion: spark.stackable.tech/v1alpha1
kind: SparkApplication
metadata:
name: write-iceberg-table
spec:
version: "1.0"
sparkImage: docker.stackable.tech/stackable/pyspark-k8s:3.3.0-stackable0.2.0
mode: cluster
mainApplicationFile: local:///stackable/spark/jobs/write-iceberg-table.py
deps:
packages:
# - org.apache.hadoop:hadoop-aws:3.3.3
- org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:0.14.1
sparkConf:
spark.hadoop.fs.s3a.aws.credentials.provider: org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider
spark.sql.extensions: org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
spark.sql.catalog.spark_catalog: org.apache.iceberg.spark.SparkSessionCatalog
spark.sql.catalog.spark_catalog.type: hive
spark.sql.catalog.local: org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.local.type: hadoop
spark.sql.catalog.local.warehouse: /tmp/warehouse
volumes:
- name: script
configMap:
name: write-iceberg-table-script
driver:
cores: 1
coreLimit: "1200m"
memory: "512m"
volumeMounts:
- name: script
mountPath: /stackable/spark/jobs
executor:
cores: 1
instances: 1
memory: "512m"
volumeMounts:
- name: script
mountPath: /stackable/spark/jobs
# - name: job-deps
# mountPath: /dependencies
---
apiVersion: v1
kind: ConfigMap
metadata:
name: write-iceberg-table-script
data:
write-iceberg-table.py: |
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("write-iceberg-table").getOrCreate()
#df = spark.read.parquet("s3a://public-backup-nyc-tlc/trip-data/yellow_tripdata_2020-04.parquet")
#df.show(10)
print("FOO creating table")
spark.sql("CREATE TABLE local.db.table (id bigint, data string) USING iceberg")
spark.sql("INSERT INTO local.db.table VALUES (1, 'a'), (2, 'b'), (3, 'c')")
spark.sql("SELECT * FROM local.db.table")
spark.sql("SELECT * FROM local.db.table.snapshots")
Would you like to work on fixing this bug?
maybe
Affected version
0.5.0
Current and expected behavior
Following https://iceberg.apache.org/docs/latest/getting-started/
Current
Use
Driver logs:
Executors do not pull the dependencies and fail with
java.lang.ClassNotFoundException: org.apache.iceberg.spark.source.SparkWrite$WriterFactory(should come withorg.apache.iceberg:iceberg-spark-runtime-3.3_2.12:0.14.1)Expected
drivers and executors pull the dependencies
Possible solution
No response
Additional context
No response
Environment
Would you like to work on fixing this bug?
maybe