Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Equalise cubes #6257

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

Equalise cubes #6257

wants to merge 7 commits into from

Conversation

pp-mo
Copy link
Member

@pp-mo pp-mo commented Dec 16, 2024

Closes #6248

Copy link

codecov bot commented Dec 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.83%. Comparing base (0fdedb4) to head (1265c26).
Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6257      +/-   ##
==========================================
- Coverage   89.83%   89.83%   -0.01%     
==========================================
  Files          88       88              
  Lines       23315    23380      +65     
  Branches     4338     4356      +18     
==========================================
+ Hits        20945    21003      +58     
- Misses       1644     1649       +5     
- Partials      726      728       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@pp-mo pp-mo marked this pull request as ready for review December 17, 2024 22:41
@pp-mo pp-mo requested a review from stephenworsley December 17, 2024 22:42
@pp-mo
Copy link
Member Author

pp-mo commented Dec 17, 2024

Status update

I think this is now good enough to consider as it is, though I'm still anticipating other useful features could be added.
It's worth considering what's included here, and why, and also what is anticipated to be added in future

"Grouping" implementation

As noted here in the original issue - section "Grouping of input cubes ?",
it was realised that in order to usefully apply "equalisation" operations like equalise_attributes to all cubes in a file load
(as we eventually hope to - see here, in section "Embedding in extended "combine_cubes" operation" ),
- they must be applied over "groups" of input cubes, not all cubes from the whole file.

The notes there describe the problem of trying to rationalise the different ways in which merge and concatenate do this "grouping"
.. however .. the current implementation drastically simplifies this, by grouping based on cube.metadata only.
I think this will make "adequate" distinction between the input cube groups over which "equalisation" operations are applied;
and obviously ..

  • it works the same for both merge + concatenate
  • it is simple enough to clearly + fully document

Currently included options :

  • unify_time_units because
    • it already exists + has proved useful
  • equalise_attributes because ...
    • it already exists + has proved useful
    • it has a particular relation to cube metadata, and interaction with input 'grouping'
      --both to affect it + to be affected by it-- so this is an opportunity to sort out how that needs to function
  • unify_names because ...
    • this operation may be needed to enable netcdf data to be concatenated,
      which is more likely to be wanted now we've made that available on loading
    • it has a particular relation to cube metadata, and interaction with input 'grouping',
      so it's worth resolving + documenting how that works in the initial version of the function

Other possible, future options

See also list in original issue

  • unify selected (compatible) units, e.g.
    • unify_compatible_units=['m', 'Pa']
  • make approximately-equal coordinates equal
  • remove aux-coords, cell-methods, cell-measures etc, e.g.
    • remove_ancils=True
    • remove_cell_measures="a_cell"
  • apply 'new-axis' to ensure promote certain scalar coords to dims, with selected additional components (see Code solutions for time-dependent hybrid height #6165 for conceptual background)
    • make_axis="time"
    • make_axis={"time": "surface_altitude"}
  • remove_coord_bounds

We can also, in future, usefully include this in the "combine_cubes" operation and "COMBINE_POLICY" / "LOAD_POLICY" settings.
I anticipate that we can add a single "equalisation_kwargs" keyword which is set to a dictionary arg. This enables a single 'equalisation phase' to occur just once before the merge/concat operation.

Copy link
Contributor

@stephenworsley stephenworsley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, just a couple things, mostly about documentation. This will also want a whatsnew describing the new function.

# Apply operations to the groups : in-place modifications on the cubes
for group_cubes in cube_group_cubes:
for op in equalisation_ops:
op(group_cubes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the default behaviour, when all kwargs are False, does nothing to the cubes. Would it be helpful to raise a warning in the case where all of these are False? I can imagine a user having an expectation that this might do something without having to alter any of the kwargs.

# Snapshot the cube metadata elements which we use to identify input groups
# TODO: we might want to sanitise practically comparable types here ?
# (e.g. large object arrays ??)
cube_grouping_keys = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be renamed cube_grouping_dicts. This is a list of dictionarys rather than a list of keys.

# TODO: might something nasty happen here if attributes contain weird stuff ??
cube_group_keys = []
cube_group_cubes = []
for cube, cube_group_key in zip(cubes, cube_grouping_keys):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As noted above, the name cube_group_key is misleading as I believe this object should be a dictionary rather than a key.

:class:`~iris.cube.CubeList`
A CubeList containing the original input cubes, ready for merge or concatenate
operations. The cubes are possibly modified (in-place), and possibly in a
different order.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be edited a little so that it doesn't suggest that the cubes might be modified in place in a different order. Something like "..., and are possibly returned in a different order".

from iris.common.metadata import CubeMetadata
from iris.cube import CubeList

if unify_names or apply_all:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is unify_names the correct name for this operation? Compared to unify_time_units, this operation works quite differently. Only removing information rather than editing it. unify_names might suggest that equivalent standard_names are made the same for example. This may also cause us problems if we ever wanted to add that sort of functionality in the future. A better name might be standardise_names or trim_names.

scramble_inds = rng.permutation(n_inputs)
inputs_array = inputs_array[scramble_inds]
# Modify input list **BUT N.B. IN PLACE**
inputs[0:] = inputs_array
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could [0:] not be replaced by [:]?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

inital draft "equalise_cubes" operation to assist merge/concatenate
2 participants