Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Executors don't resolve dependencies #141

@sbernauer

Description

@sbernauer

Affected version

0.5.0

Current and expected behavior

Following https://iceberg.apache.org/docs/latest/getting-started/

Current

Use

  deps:
    packages:
      # - org.apache.hadoop:hadoop-aws:3.3.3
      - org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:0.14.1

Driver logs:

:: loading settings :: url = jar:file:/stackable/spark-3.3.0-bin-hadoop3/jars/ivy-2.5.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
Ivy Default Cache set to: /stackable/.ivy2/cache
The jars for the packages stored in: /stackable/.ivy2/jars
org.apache.iceberg#iceberg-spark-runtime-3.3_2.12 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent-ddd36894-c46d-4d2f-82b2-8a916f718eba;1.0
        confs: [default]
        found org.apache.iceberg#iceberg-spark-runtime-3.3_2.12;0.14.1 in central
downloading https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-spark-runtime-3.3_2.12/0.14.1/iceberg-spark-runtime-3.3_2.12-0.14.1.jar ...
        [SUCCESSFUL ] org.apache.iceberg#iceberg-spark-runtime-3.3_2.12;0.14.1!iceberg-spark-runtime-3.3_2.12.jar (11980ms)
:: resolution report :: resolve 496ms :: artifacts dl 11983ms
        :: modules in use:
        org.apache.iceberg#iceberg-spark-runtime-3.3_2.12;0.14.1 from central in [default]
        ---------------------------------------------------------------------
        |                  |            modules            ||   artifacts   |
        |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
        ---------------------------------------------------------------------
        |      default     |   1   |   1   |   1   |   0   ||   1   |   1   |
        ---------------------------------------------------------------------
:: retrieving :: org.apache.spark#spark-submit-parent-ddd36894-c46d-4d2f-82b2-8a916f718eba
        confs: [default]
        1 artifacts copied, 0 already retrieved (29791kB/17ms)

Executors do not pull the dependencies and fail with java.lang.ClassNotFoundException: org.apache.iceberg.spark.source.SparkWrite$WriterFactory (should come with org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:0.14.1)

Expected

drivers and executors pull the dependencies

Possible solution

No response

Additional context

No response

Environment

---
apiVersion: spark.stackable.tech/v1alpha1
kind: SparkApplication
metadata:
  name: write-iceberg-table
spec:
  version: "1.0"
  sparkImage: docker.stackable.tech/stackable/pyspark-k8s:3.3.0-stackable0.2.0
  mode: cluster
  mainApplicationFile: local:///stackable/spark/jobs/write-iceberg-table.py
  deps:
    packages:
      # - org.apache.hadoop:hadoop-aws:3.3.3
      - org.apache.iceberg:iceberg-spark-runtime-3.3_2.12:0.14.1
  sparkConf:
    spark.hadoop.fs.s3a.aws.credentials.provider: org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider
    spark.sql.extensions: org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
    spark.sql.catalog.spark_catalog: org.apache.iceberg.spark.SparkSessionCatalog
    spark.sql.catalog.spark_catalog.type: hive
    spark.sql.catalog.local: org.apache.iceberg.spark.SparkCatalog
    spark.sql.catalog.local.type: hadoop
    spark.sql.catalog.local.warehouse: /tmp/warehouse
  volumes:
    - name: script
      configMap:
        name: write-iceberg-table-script
  driver:
    cores: 1
    coreLimit: "1200m"
    memory: "512m"
    volumeMounts:
      - name: script
        mountPath: /stackable/spark/jobs
  executor:
    cores: 1
    instances: 1
    memory: "512m"
    volumeMounts:
      - name: script
        mountPath: /stackable/spark/jobs
      # - name: job-deps
      #   mountPath: /dependencies
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: write-iceberg-table-script
data:
  write-iceberg-table.py: |
    from pyspark.sql import SparkSession
    spark = SparkSession.builder.appName("write-iceberg-table").getOrCreate()

    #df = spark.read.parquet("s3a://public-backup-nyc-tlc/trip-data/yellow_tripdata_2020-04.parquet")
    #df.show(10)

    print("FOO creating table")
    spark.sql("CREATE TABLE local.db.table (id bigint, data string) USING iceberg")
    spark.sql("INSERT INTO local.db.table VALUES (1, 'a'), (2, 'b'), (3, 'c')")
    spark.sql("SELECT * FROM local.db.table")
    spark.sql("SELECT * FROM local.db.table.snapshots")

Would you like to work on fixing this bug?

maybe

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions