-
Notifications
You must be signed in to change notification settings - Fork 692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
error on rawDf.show() #1723
Comments
Thank you for your interest in Apache Sedona! We appreciate you opening your first issue. Contributions like yours help make Apache Sedona better. |
@angelosnm the stacktrace above is incomplete. Can you find the place where it shows |
Hello I have a similar problem when trying to load "larger" rasterfiles with 65 MB size, although my error message is different. Therefore I do not open a new issue. Setup:
Then I run the container with:
In Jupyter lab, I execute the code in the examples notebook When I copy my orthophoto Notebook Code:
Expected behavior:
Actual behaviour:
Resulting error message:
But with the provided docker image and orthophoto tif file you should be able to reproduce the error. As I use the latest docker image I do not add the versions of the several packages. The mentioned orthophoto is to large (65 MB), so I cannot attach it, but with the following download link, one can directly access it from the site of the Federal Office for Topography of Switzerland: If there are any further questions, I am happy to answer. |
@jiayuasu I was monitoring the Spark Web UI console and nothing specific was logged... My guess is that it was a network related issue. I am using a Jupyter based container image inside a k8s cluster where I send PySpark jobs to a spark cluster that is in the same local network with the k8s cluster but referencing it externally by exposing the master endpoint to the global internet. This setup introduces lots of network configurations that need to be done on both clusters (such as having to route k8s pods CIDR from the Spark nodes, set up networking policies on k8s, etc). I just deployed Spark inside the k8s cluster (using the official |
@Jaeggi99 Your error message is pretty clear: : java.lang.OutOfMemoryError: Java heap space This is a separate issue which is irrelevant to this ticket. In your case, the image is too large (65MB) so it explodes the driver memory of Spark. I suggest you use the out-db raster mode of WherobotsDB to load it: https://docs.wherobots.com/latest/tutorials/wherobotsdb/raster-data/raster-load/#create-an-out-db-raster-type-column . WherobotsDB has a free tier and you can play with it. |
Thank you for clarifying. Although the symptom of outOfMemory was clear, I thought maybe that Sedona or Spark could be the cause of the problem. Because I was able to load and show a 2.3 GB large geopackage with vector data, therefore as a newbie I was confused that Sedona/Spark is not able to handle a 65 MB tif. Now I now and will try out your suggestion. |
I have set up a standalone Spark cluster where PySpark jobs are being sent. These jobs are having the below config where S3/MinIO is being used as HDFS (using the S3A package) to read raster files:
Then, the raster/tif files are being accessed as per below:
And this code. returns the error mentioned in the "Actual behavior" entry.
If this code runs under local mode it runs normally
Expected behavior
Actual behavior
Steps to reproduce the problem
Settings
Sedona version = 1.6.1
Apache Spark version = 3.5.2
Apache Flink version = N/A
API type = Python
Scala version = 2.12
JRE version = 1.8.0_432
Python version = 3.11.10
Environment = Standalone
The text was updated successfully, but these errors were encountered: