Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add all touched parameter to RS_ZonalStats #1715

Open
VtotheG opened this issue Dec 9, 2024 · 2 comments
Open

Add all touched parameter to RS_ZonalStats #1715

VtotheG opened this issue Dec 9, 2024 · 2 comments

Comments

@VtotheG
Copy link

VtotheG commented Dec 9, 2024

I want to use ZonalStats (https://sedona.apache.org/latest/api/sql/Raster-operators/#rs_zonalstats) for my use case of integrating polygons of buildings with information from a tif file.

In the past I used the rasterstats library to get the info in python, but i have a lot of buildings and wanted to use the power of Sedona. When I look at rasterstats it has the property all_touched which determines if the pixel should be included in the rasterstats calculation see; https://pythonhosted.org/rasterstats/manual.html#zonal-statistics.

Actual behavior

The current RS_ZonalStats function only returns the value of pixels that fall within my polygon. This causes a lot of issues when the houses / buildings are quite small (it will return a NaN because non of the pixels are fully encapsulated by the polygon.

Steps to reproduce the problem

Example of the usage
results = sedona.sql('''
select 
id
, geometry
, RS_ZonalStats(r.raster, a.geometry, 1, 'max', false, true) as max_elevation 
from buildings a, tifmap_with_elevation r
where 
RS_intersects(geometry, r.raster) 
'''

It would be great to have an additional parameter which would indicate if RS_ZonalStats should only summarize the statistics of pixels within the geometry or all pixel that are being touched by the geometry

I thought that adding a buffer with the size of my pixels would solve my issue, but i think it is then still possible to grab pixels which should not be part of the calculation.

Settings

Sedona version = 1.7.0

Apache Spark version = 3.5.0

Apache Flink version = ?

API type = Python / SQL

Scala version = 2.12

JRE version = ?

Python version = 3.11

Environment = Databricks

Copy link

github-actions bot commented Dec 9, 2024

Thank you for your interest in Apache Sedona! We appreciate you opening your first issue. Contributions like yours help make Apache Sedona better.

@jiayuasu
Copy link
Member

We are aware of this and we are working it now. CC @prantogg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants