Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: interpolating Dask Array with NumPy Arrays completely blows up the chunk size for multiple dimensions #9907

Open
5 tasks
phofl opened this issue Dec 18, 2024 · 5 comments
Labels
bug topic-chunked-arrays Managing different chunked backends, e.g. dask topic-interpolation

Comments

@phofl
Copy link
Contributor

phofl commented Dec 18, 2024

What happened?

Interpolating rechecks to -1 along the interpolation axis, doing this for many dimensions at once will blow up the chunk sizes :(
Screenshot 2024-12-18 at 14 54 31

This seems to happen when you put stuff into blockwise, I think we might want to rechunk the coordinates to the proper chunk size, but not sure

What did you expect to happen?

Keep chunk sizes consistent through rechunking the other dimensions appropriately I guess

@dcherian would your current work in this area impact this?

Minimal Complete Verifiable Example

import dask.array as da


import dask.array
import pandas as pd
import numpy as np

import xarray as xr

arr = xr.DataArray(
    da.random.random((1, 75902, 45910), chunks=(1, "auto", -1)),
    dims=["band", "y", "x"],
    coords={"x": np.linspace(-73.58, -62.11, 45910), "y": np.linspace(-36.08, -55.05, 75902)},
    name="bla",
)

arr2 = xr.DataArray(
    da.random.random((1, 75902, 45910), chunks=(1, "auto", -1)),
    dims=["band", "y", "x"],
    coords={"x": np.linspace(-73.58, -62.11, 45910), "y": np.linspace(-36.08, -55.05, 75902)},
    name="bla",
)

x = arr2.interp(
    x=arr.coords["x"],
    y=arr.coords["y"],
    method="linear",
)

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.
  • Recent environment — the issue occurs with the latest version of xarray and its dependencies.

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None
python: 3.11.10 | packaged by conda-forge | (main, Oct 16 2024, 01:26:25) [Clang 17.0.6 ]
python-bits: 64
OS: Darwin
OS-release: 23.4.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: None
LOCALE: (None, 'UTF-8')
libhdf5: 1.14.3
libnetcdf: None

xarray: 2024.10.1.dev51+g864b35a1
pandas: 2.2.3
numpy: 2.1.3
scipy: 1.14.1
netCDF4: None
pydap: None
h5netcdf: 1.4.1
h5py: 3.12.1
zarr: 3.0.0b3.dev6+g7c2ebe2
cftime: None
nc_time_axis: None
iris: None
bottleneck: 1.4.2
dask: 2024.12.1+0.g2c0ac83fc.dirty
distributed: 2024.12.1
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: 2024.10.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 75.3.0
pip: 24.3.1
conda: None
pytest: 8.3.3
mypy: None
IPython: 8.29.0
sphinx: None

@phofl phofl added bug needs triage Issue that has not been reviewed by xarray team member labels Dec 18, 2024
@dcherian
Copy link
Contributor

working on avoid the blockwise here ... :) it is smart enough to do x first, then y though. Is it better with #9881?

@dcherian dcherian added topic-chunked-arrays Managing different chunked backends, e.g. dask topic-interpolation and removed needs triage Issue that has not been reviewed by xarray team member labels Dec 18, 2024
@phofl
Copy link
Contributor Author

phofl commented Dec 18, 2024

No, not at the moment, I think the core dimension is still rechunked to -1 without touching the other axis

((1,), (75902,), (45910,))

@dcherian
Copy link
Contributor

yes does dask.array.apply_gufunc not auto rechunk the other axes yet when allow_rechunk=True? we can explicitly request this on the Xarray end if needed. This codepath will still be active for any spline interpolations.

@phofl
Copy link
Contributor Author

phofl commented Dec 18, 2024

Yikes, no :(

This function is complicated... I'll look into this on our end, that should work differently imo

Yeah requesting this on our end for the current code path would be helpful too I think (I don't think that we would want to change how blockwise alignment works)

@dcherian
Copy link
Contributor

dcherian commented Dec 18, 2024

Right allow_rechunk should rechunk to -1 along core-dimensions and adjust any others as neccessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug topic-chunked-arrays Managing different chunked backends, e.g. dask topic-interpolation
Projects
None yet
Development

No branches or pull requests

2 participants