-
Notifications
You must be signed in to change notification settings - Fork 284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A Dataless Cube #4447
Comments
@bjlittle supermegahypercubes! That is, a cube that describes how huge numbers of incoming datasets would tile together to make an n-dimensional hyperstructure - think, for example, of representing an entire model run in a single object. This would ideally be represented as a metadata-only cube, with individual data payloads very much fetched on demand only, given the vast quantities of data such an object would represent. We've considered this idea from a variety of different perspectives in the Informatics Lab, and we think it has legs. We've also given the idea a bunch of different names, but supermegahypercubes is the best, most whimsical and original name we came up with for the concept 🙂 |
@bjlittle are you including here the idea that possibly only some of the data might be "filled", with some of it left unidentified. P.S. as a name, for that idea at least, I think "hypothicube" is neater (though for language purists that should probably be "hypothecube" 😉 ) |
@bjlittle - re your concrete use-cases:
|
@pp-mo - dunno re issue, but wonder if you're recalling the part-filled example in Jacob's hypotheticube article? Or poss another Informatic Lab article? (here's @DPeterK 's one on supermegahypercubes |
I feed streams of cubes through Machine Learning software (TensorFlow - TF). This requires throwing away the metadata and operating only on the data arrays, and then laboriously reconstructing metadata around the output data. It would be great to be able to cut a cube into data and metadata components, process them seperately and recombine them later. |
In Dragon Taming ™️ discussion today, I suggested that we should AFAP "contain" code changes within the DataManager class, i.e. no or minimal change should be required in Cube code. Just as a hint for implementation, it is also very simple to make a lazy array which has no data, so can participate normally in any lazy operations, but can't be fetched. Here's a simple working example.
|
To clarify my (mis)understanding of what you mean @pp-mo - the |
Ah no, not that actually. So I was just hoping that, since we have already have this class encapsulating the possible array types, it would be neat if we can support "dataless" purely by extending what a DataManager can do, rather than by making a bunch of changes elsewhere, e.g. in the Cube class. |
P.S. further clarification (hopefully) |
We have looked into dataless cubes. We've decided that the first step into dataless cubes is to create a cube with coords, but no data. This is checked via ndims , which has no setter. This is calculated in the dataManager, using shape. We believe that shape should be settable, but only (and non-optionally) if data hasn't been set. This will require changing the DataManager. DataManager(data, shape:optional): |
✨ Feature Request
I think it's healthy to challenge established norms...
I want the ability to create a dataless cube. By this I mean the ability to create a hyper-space defined only by metadata i.e., no data payload
Once data is added to the cube, then the dimensionality is established and locked down, as we traditionally know and accept.
Motivation
Such hyper-spaces could be used in various ways e.g.,
I'm sure there are more concrete use cases... Please do share them on this issue if you know or any 🙏
Traditionally, there are many situations where a cube enforcing that it must have data is simply an inconvenience. Given the natural progression of model resolutions it seems "just wrong" to abuse dask to create lazy data that will never be used. It reeks of something not being quite right to me.
Let's do something about that 😉
Please up vote this issue if you'd like to see this happen 👍
Steps
The text was updated successfully, but these errors were encountered: