Bottleneck bug with unusual strides - causes segfault or wrong number #5424

lusewell · 2021-06-01T16:13:12Z

import numpy as np
import xarray as xr
data = np.zeros((1,500, 2)).transpose(1,2,0)
xarr = xr.DataArray(data, coords=[('A', range(500)), ('B', [0,1]), ('C', [0])])
xarr.max()

The above either returns a very large non-zero number or segfaults. Due to pydata/bottleneck#381.

Dual posting here in case this isn't able to get quickly fixed in bottleneck, as this is a pretty severe bug - especially on the occaions it returns the wrong number rather than segfaulting.

Environment:

INSTALLED VERSIONS
------------------
commit: None
python: 3.7.7 (default, Mar 26 2020, 15:48:22) 
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-1160.21.1.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.4
libnetcdf: None

xarray: 0.14.1
pandas: 0.25.0
numpy: 1.16.6
scipy: 1.4.1
netCDF4: None
pydap: None
h5netcdf: None
h5py: 2.10.0
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2.10.1
distributed: 2.10.0
matplotlib: 3.1.3
cartopy: None
seaborn: 0.10.0
numbagg: installed
setuptools: 46.1.3.post20200330
pip: 20.0.2
conda: None
pytest: 5.2.4
IPython: 7.20.0
sphinx: 2.4.4

The text was updated successfully, but these errors were encountered:

max-sixty · 2021-06-01T16:33:09Z

Thanks @lusewell . Unfortunately — as you suggest — I don't think there's much we can do — but this does seem like a bad bug.

It might be worth checking out numbagg — https://github.com/numbagg/numbagg — which we use for fast operations that bottleneck doesn't include. Disclaimer that it comes from @shoyer , and I've recently given it a spring cleaning. To the extent this isn't fixed in bottleneck, we could offer an option to use numbagg, though it would probably require a contribution.

If you need this working for now, you could probably write a workaround for yourself using numbagg fairly quickly; e.g.

In [6]: numbagg.nanmax(xarr.values)
Out[6]: 0.0

# or, more generally:

In [12]: xr.apply_ufunc(numbagg.nanmax, xarr, input_core_dims=(('A','B','C'),))
Out[12]:
<xarray.DataArray ()>
array(0.)

lusewell · 2021-06-01T18:01:05Z

Annoyingly the bug affects pretty much every bottleneck function, not just max, and I'm dealing with a large codebase where lots of the code just uses the methods attached to the xr.DataArrays.

Is there a way of disabling use of bottleneck inside xarray without uninstalling bottleneck? And if so do you know if this is expected to give the same results? Pandas (probably a few versions ago now) had a situation where if you uninstalled bottleneck it would use some other routine, but the nan-handling was then different - I think it caused the all-nan sum to flick between nan and zero if I recall.

Quick response appreciated though, and I might have a delve into fixing bottleneck myself if I get the free time.

max-sixty · 2021-06-01T18:36:04Z

I don't think there's a config for disabling bottleneck — assuming that's correct, we'd take a PR for one.

FYI one does seem to work is setting the type to float:

   ...: xarr.astype(float).max()
Out[1]:
<xarray.DataArray ()>
array(0.)

dcherian · 2021-06-16T17:42:34Z

there's a config for disabling bottleneck — assuming that's correct, we'd take a PR for one.

Yeah I think it'd be nice to opt-in/out to bottleneck and maybe even support numbagg somehow.

max-sixty added the upstream issue label Jun 3, 2021

keewis mentioned this issue Jul 1, 2021

conditionally disable bottleneck #5560

Merged

4 tasks

dcherian closed this as completed in #5560 Aug 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bottleneck bug with unusual strides - causes segfault or wrong number #5424

Bottleneck bug with unusual strides - causes segfault or wrong number #5424

lusewell commented Jun 1, 2021

max-sixty commented Jun 1, 2021 •

edited

Loading

lusewell commented Jun 1, 2021

max-sixty commented Jun 1, 2021

dcherian commented Jun 16, 2021

Bottleneck bug with unusual strides - causes segfault or wrong number #5424

Bottleneck bug with unusual strides - causes segfault or wrong number #5424

Comments

lusewell commented Jun 1, 2021

max-sixty commented Jun 1, 2021 • edited Loading

lusewell commented Jun 1, 2021

max-sixty commented Jun 1, 2021

dcherian commented Jun 16, 2021

max-sixty commented Jun 1, 2021 •

edited

Loading