Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bottleneck bug with unusual strides - causes segfault or wrong number #5424

Closed
lusewell opened this issue Jun 1, 2021 · 4 comments · Fixed by #5560
Closed

Bottleneck bug with unusual strides - causes segfault or wrong number #5424

lusewell opened this issue Jun 1, 2021 · 4 comments · Fixed by #5560

Comments

@lusewell
Copy link
Contributor

lusewell commented Jun 1, 2021

import numpy as np
import xarray as xr
data = np.zeros((1,500, 2)).transpose(1,2,0)
xarr = xr.DataArray(data, coords=[('A', range(500)), ('B', [0,1]), ('C', [0])])
xarr.max()                                  

The above either returns a very large non-zero number or segfaults. Due to pydata/bottleneck#381.

Dual posting here in case this isn't able to get quickly fixed in bottleneck, as this is a pretty severe bug - especially on the occaions it returns the wrong number rather than segfaulting.

Environment:

INSTALLED VERSIONS
------------------
commit: None
python: 3.7.7 (default, Mar 26 2020, 15:48:22) 
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-1160.21.1.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.4
libnetcdf: None

xarray: 0.14.1
pandas: 0.25.0
numpy: 1.16.6
scipy: 1.4.1
netCDF4: None
pydap: None
h5netcdf: None
h5py: 2.10.0
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2.10.1
distributed: 2.10.0
matplotlib: 3.1.3
cartopy: None
seaborn: 0.10.0
numbagg: installed
setuptools: 46.1.3.post20200330
pip: 20.0.2
conda: None
pytest: 5.2.4
IPython: 7.20.0
sphinx: 2.4.4
@max-sixty
Copy link
Collaborator

max-sixty commented Jun 1, 2021

Thanks @lusewell . Unfortunately — as you suggest — I don't think there's much we can do — but this does seem like a bad bug.

It might be worth checking out numbagg — https://github.com/numbagg/numbagg — which we use for fast operations that bottleneck doesn't include. Disclaimer that it comes from @shoyer , and I've recently given it a spring cleaning. To the extent this isn't fixed in bottleneck, we could offer an option to use numbagg, though it would probably require a contribution.

If you need this working for now, you could probably write a workaround for yourself using numbagg fairly quickly; e.g.

In [6]: numbagg.nanmax(xarr.values)
Out[6]: 0.0

# or, more generally:

In [12]: xr.apply_ufunc(numbagg.nanmax, xarr, input_core_dims=(('A','B','C'),))
Out[12]:
<xarray.DataArray ()>
array(0.)

@lusewell
Copy link
Contributor Author

lusewell commented Jun 1, 2021

Annoyingly the bug affects pretty much every bottleneck function, not just max, and I'm dealing with a large codebase where lots of the code just uses the methods attached to the xr.DataArrays.

Is there a way of disabling use of bottleneck inside xarray without uninstalling bottleneck? And if so do you know if this is expected to give the same results? Pandas (probably a few versions ago now) had a situation where if you uninstalled bottleneck it would use some other routine, but the nan-handling was then different - I think it caused the all-nan sum to flick between nan and zero if I recall.

Quick response appreciated though, and I might have a delve into fixing bottleneck myself if I get the free time.

@max-sixty
Copy link
Collaborator

I don't think there's a config for disabling bottleneck — assuming that's correct, we'd take a PR for one.

FYI one does seem to work is setting the type to float:

   ...: xarr.astype(float).max()
Out[1]:
<xarray.DataArray ()>
array(0.)

@dcherian
Copy link
Contributor

there's a config for disabling bottleneck — assuming that's correct, we'd take a PR for one.

Yeah I think it'd be nice to opt-in/out to bottleneck and maybe even support numbagg somehow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants