-
Notifications
You must be signed in to change notification settings - Fork 692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Serialization Issue with Sedona and Iceberg (Kryo serializer) #1724
Comments
Thank you for your interest in Apache Sedona! We appreciate you opening your first issue. Contributions like yours help make Apache Sedona better. |
I have not reproduced this using similar configurations. Can you try running the repro without Sedona but with Kryo serialization enabled? from pyspark.sql import SparkSession
spark = (
SparkSession.builder.master('spark://localhost:5581')
.config(
'spark.jars.packages',
'org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.7.1,' 'org.apache.iceberg:iceberg-aws-bundle:1.7.1,' 'org.postgresql:postgresql:42.7.4',
)
.config('spark.serializer', 'org.apache.spark.serializer.KryoSerializer')
.config('spark.sql.extensions', 'org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions')
.config('spark.sql.catalog.my_catalog', 'org.apache.iceberg.spark.SparkCatalog')
.config('spark.sql.catalog.my_catalog.type', 'jdbc')
.config('spark.sql.catalog.my_catalog.uri', 'jdbc:postgresql://localhost:5500/data_catalog_apache_iceberg')
.config('spark.sql.catalog.my_catalog.jdbc.user', 'postgres')
.config('spark.sql.catalog.my_catalog.jdbc.password', 'postgres')
.config('spark.sql.catalog.my_catalog.io-impl', 'org.apache.iceberg.aws.s3.S3FileIO')
.config('spark.sql.catalog.my_catalog.warehouse', 's3a://data-lakehouse')
.config('spark.sql.catalog.my_catalog.s3.endpoint', 'http://localhost:5561')
.config('spark.sql.catalog.my_catalog.s3.access-key-id', 'admin')
.config('spark.sql.catalog.my_catalog.s3.secret-access-key', 'password')
.getOrCreate()
)
spark.sql('CREATE TABLE my_catalog.table8 (name string) USING iceberg;')
spark.sql("INSERT INTO my_catalog.table8 VALUES ('Alex'), ('Dipankar'), ('Jason')") |
Thank's for looking into it.
You're right. In this case the error appears too ...
Here are some instructions to reproduce the error:
I'm using the following pyspark and sedona versions:
|
Expected behavior
Data should be successfully inserted into the Iceberg table without serialisation errors when using Sedona and Iceberg.
Actual behavior
The
INSERT INTO
operation fails with a Kryo serialisation exception. The error trace indicates anIndexOutOfBoundsException
in the Kryo serializer while handling Iceberg'sGenericDataFile
andSparkWrite.TaskCommit
objects.Error message:
Steps to reproduce the problem
Additional information
If I perform the same operations using Spark without Sedona, everything works seamlessly:
If I'm using the JavaSerializer (
.config('spark.serializer', 'org.apache.spark.serializer.JavaSerializer')
) in the Sedona example, it works.Settings
Sedona version = 1.7.1
Apache Spark version = 3.5
API type = Python
Scala version = 2.12
JRE version = 11.0.25
Python version = 3.12.0
Environment = Standalone
The text was updated successfully, but these errors were encountered: