Equalise cubes #6257

pp-mo · 2024-12-16T14:40:17Z

codecov · 2024-12-16T14:55:03Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.83%. Comparing base (0fdedb4) to head (1265c26).
Report is 3 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #6257      +/-   ##
==========================================
- Coverage   89.83%   89.83%   -0.01%     
==========================================
  Files          88       88              
  Lines       23315    23380      +65     
  Branches     4338     4356      +18     
==========================================
+ Hits        20945    21003      +58     
- Misses       1644     1649       +5     
- Partials      726      728       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…cubes.

pp-mo · 2024-12-17T23:04:28Z

Status update

I think this is now good enough to consider as it is, though I'm still anticipating other useful features could be added.
It's worth considering what's included here, and why, and also what is anticipated to be added in future

"Grouping" implementation

As noted here in the original issue - section "Grouping of input cubes ?",
it was realised that in order to usefully apply "equalisation" operations like equalise_attributes to all cubes in a file load
(as we eventually hope to - see here, in section "Embedding in extended "combine_cubes" operation" ),
- they must be applied over "groups" of input cubes, not all cubes from the whole file.

The notes there describe the problem of trying to rationalise the different ways in which merge and concatenate do this "grouping"
.. however .. the current implementation drastically simplifies this, by grouping based on cube.metadata only.
I think this will make "adequate" distinction between the input cube groups over which "equalisation" operations are applied;
and obviously ..

it works the same for both merge + concatenate
it is simple enough to clearly + fully document

Currently included options :

unify_time_units because
- it already exists + has proved useful
equalise_attributes because ...
- it already exists + has proved useful
- it has a particular relation to cube metadata, and interaction with input 'grouping'
  --both to affect it + to be affected by it-- so this is an opportunity to sort out how that needs to function
unify_names because ...
- this operation may be needed to enable netcdf data to be concatenated,
  which is more likely to be wanted now we've made that available on loading
- it has a particular relation to cube metadata, and interaction with input 'grouping',
  so it's worth resolving + documenting how that works in the initial version of the function

Other possible, future options

See also list in original issue

unify selected (compatible) units, e.g.
- unify_compatible_units=['m', 'Pa']
make approximately-equal coordinates equal
remove aux-coords, cell-methods, cell-measures etc, e.g.
- remove_ancils=True
- remove_cell_measures="a_cell"
apply 'new-axis' to ensure promote certain scalar coords to dims, with selected additional components (see Code solutions for time-dependent hybrid height #6165 for conceptual background)
- make_axis="time"
- make_axis={"time": "surface_altitude"}
remove_coord_bounds

We can also, in future, usefully include this in the "combine_cubes" operation and "COMBINE_POLICY" / "LOAD_POLICY" settings.
I anticipate that we can add a single "equalisation_kwargs" keyword which is set to a dictionary arg. This enables a single 'equalisation phase' to occur just once before the merge/concat operation.

stephenworsley

Looking good, just a couple things, mostly about documentation. This will also want a whatsnew describing the new function.

stephenworsley · 2024-12-18T15:49:47Z

lib/iris/util.py

+        # Apply operations to the groups : in-place modifications on the cubes
+        for group_cubes in cube_group_cubes:
+            for op in equalisation_ops:
+                op(group_cubes)


It looks like the default behaviour, when all kwargs are False, does nothing to the cubes. Would it be helpful to raise a warning in the case where all of these are False? I can imagine a user having an expectation that this might do something without having to alter any of the kwargs.

stephenworsley · 2024-12-18T15:57:04Z

lib/iris/util.py

+    # Snapshot the cube metadata elements which we use to identify input groups
+    # TODO: we might want to sanitise practically comparable types here ?
+    #  (e.g. large object arrays ??)
+    cube_grouping_keys = [


This should probably be renamed cube_grouping_dicts. This is a list of dictionarys rather than a list of keys.

stephenworsley · 2024-12-18T15:58:23Z

lib/iris/util.py

+        # TODO: might something nasty happen here if attributes contain weird stuff ??
+        cube_group_keys = []
+        cube_group_cubes = []
+        for cube, cube_group_key in zip(cubes, cube_grouping_keys):


As noted above, the name cube_group_key is misleading as I believe this object should be a dictionary rather than a key.

stephenworsley · 2024-12-18T16:02:02Z

lib/iris/util.py

+    :class:`~iris.cube.CubeList`
+        A CubeList containing the original input cubes, ready for merge or concatenate
+        operations.  The cubes are possibly modified (in-place), and possibly in a
+        different order.


I think this can be edited a little so that it doesn't suggest that the cubes might be modified in place in a different order. Something like "..., and are possibly returned in a different order".

stephenworsley · 2024-12-18T16:20:56Z

lib/iris/util.py

+    from iris.common.metadata import CubeMetadata
+    from iris.cube import CubeList
+
+    if unify_names or apply_all:


Is unify_names the correct name for this operation? Compared to unify_time_units, this operation works quite differently. Only removing information rather than editing it. unify_names might suggest that equivalent standard_names are made the same for example. This may also cause us problems if we ever wanted to add that sort of functionality in the future. A better name might be standardise_names or trim_names.

stephenworsley · 2024-12-18T16:46:53Z

lib/iris/tests/unit/util/test_equalise_cubes.py

+    scramble_inds = rng.permutation(n_inputs)
+    inputs_array = inputs_array[scramble_inds]
+    # Modify input list **BUT N.B. IN PLACE**
+    inputs[0:] = inputs_array


Could [0:] not be replaced by [:]?

pp-mo added 4 commits December 11, 2024 18:28

Initial equalise_cubes util.

fdc78be

Initial something working.

0b71f59

Tweaks, improvements, notes.

f483d4b

Initial partial testing

cbf7514

pp-mo added 2 commits December 17, 2024 16:07

Small tweaks.

94a70a2

Tidy a bit. Test 'unify_time_units'. NB time-units are on coords not …

dcf40d5

…cubes.

pp-mo marked this pull request as ready for review December 17, 2024 22:41

pp-mo requested a review from stephenworsley December 17, 2024 22:42

Fix grouping efficiency.

1265c26

pp-mo mentioned this pull request Dec 18, 2024

inital draft "equalise_cubes" operation to assist merge/concatenate #6248

Open

stephenworsley requested changes Dec 18, 2024

View reviewed changes

stephenworsley reviewed Dec 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Equalise cubes #6257

Equalise cubes #6257

pp-mo commented Dec 16, 2024

codecov bot commented Dec 16, 2024 •

edited

Loading

pp-mo commented Dec 17, 2024 •

edited

Loading

stephenworsley left a comment

stephenworsley Dec 18, 2024

stephenworsley Dec 18, 2024

stephenworsley Dec 18, 2024

stephenworsley Dec 18, 2024

stephenworsley Dec 18, 2024

stephenworsley Dec 18, 2024

Equalise cubes #6257

Are you sure you want to change the base?

Equalise cubes #6257

Conversation

pp-mo commented Dec 16, 2024

codecov bot commented Dec 16, 2024 • edited Loading

Codecov Report

pp-mo commented Dec 17, 2024 • edited Loading

Status update

"Grouping" implementation

Currently included options :

Other possible, future options

stephenworsley left a comment

Choose a reason for hiding this comment

stephenworsley Dec 18, 2024

Choose a reason for hiding this comment

stephenworsley Dec 18, 2024

Choose a reason for hiding this comment

stephenworsley Dec 18, 2024

Choose a reason for hiding this comment

stephenworsley Dec 18, 2024

Choose a reason for hiding this comment

stephenworsley Dec 18, 2024

Choose a reason for hiding this comment

stephenworsley Dec 18, 2024

Choose a reason for hiding this comment

codecov bot commented Dec 16, 2024 •

edited

Loading

pp-mo commented Dec 17, 2024 •

edited

Loading