-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bottleneck bug with unusual strides - causes segfault or wrong number #5424
Comments
Thanks @lusewell . Unfortunately — as you suggest — I don't think there's much we can do — but this does seem like a bad bug. It might be worth checking out numbagg — https://github.com/numbagg/numbagg — which we use for fast operations that bottleneck doesn't include. Disclaimer that it comes from @shoyer , and I've recently given it a spring cleaning. To the extent this isn't fixed in bottleneck, we could offer an option to use numbagg, though it would probably require a contribution. If you need this working for now, you could probably write a workaround for yourself using numbagg fairly quickly; e.g. In [6]: numbagg.nanmax(xarr.values)
Out[6]: 0.0
# or, more generally:
In [12]: xr.apply_ufunc(numbagg.nanmax, xarr, input_core_dims=(('A','B','C'),))
Out[12]:
<xarray.DataArray ()>
array(0.) |
Annoyingly the bug affects pretty much every bottleneck function, not just max, and I'm dealing with a large codebase where lots of the code just uses the methods attached to the Is there a way of disabling use of bottleneck inside xarray without uninstalling bottleneck? And if so do you know if this is expected to give the same results? Pandas (probably a few versions ago now) had a situation where if you uninstalled bottleneck it would use some other routine, but the nan-handling was then different - I think it caused the all-nan Quick response appreciated though, and I might have a delve into fixing bottleneck myself if I get the free time. |
I don't think there's a config for disabling bottleneck — assuming that's correct, we'd take a PR for one. FYI one does seem to work is setting the type to float: ...: xarr.astype(float).max()
Out[1]:
<xarray.DataArray ()>
array(0.) |
Yeah I think it'd be nice to opt-in/out to bottleneck and maybe even support numbagg somehow. |
The above either returns a very large non-zero number or segfaults. Due to pydata/bottleneck#381.
Dual posting here in case this isn't able to get quickly fixed in bottleneck, as this is a pretty severe bug - especially on the occaions it returns the wrong number rather than segfaulting.
Environment:
The text was updated successfully, but these errors were encountered: