ADR 0006: Add read-only flags

  • Status: accepted

  • Deciders: Jan-Lukas, Neil, Owen, Simon

  • Date: 2021-05-05

Context

There are a number of contexts where values of variables are conceptually “broadcast” (not necessarily using an actual broadcast operation) and are thus shared (not to be confused with sharing from the shallow-copy mechanism):

  • Explicit broadcast operations.

  • Masks of dimension N-M if data is of dimension N. Each mask value is conceptually shared by all data values along the M missing dimensions in the mask.

  • Coords along dimension dim0 of slices of a data array along dimension dim1. The coord values are conceptually shared by all slices.

  • Items independent of dim in a dataset which is then sliced along dim. These items are conceptually shared by all slices.

  • Coords of items in a dataset. The coords are conceptually shared by all items.

In all of the above cases a subsequent in-place modification would silently affect other unrelated (sub)objects such as other slices or items of the same “parent” object.

This can be solved by marking the variables affected in these cases as “read-only”.

A further problem arises in in-place binary operations such as array['dim0', 0] += other. If the right-hand-side in such an operation contains masks that are not present in the left-hand-side they are inserted into the left-hand-side masks dict. In this example, other contained a mask 'extra_mask' that is not present in array it would get inserted into array['dim0', 0].masks. Since slicing operations create new meta data dicts, 'extra_mask' would get inserted into a temporary dict, and silently disappear after the operation. This is effectively “unmasking” elements.

Note that a hypothetical mechanism that would insert the masks into the slice’s parent’s masks dict, array.masks would need to provide a mechanism for broadcasting and initializing this new mask for all other slices. The complexity of such a mechanism does not appear justifiable given the minor advantages.

The problem of meta-data insertion into slices can be solved by marking the meta data dicts of slices as “read-only”, which prevents item insertion.

Decision

Add readonly flag to:

  • Variable

  • Metadata dicts for coords, masks, and attrs.

Operations fail rather than silently ignoring read-only flags of variables or metadata dicts.

Consequences

Positive:

  • Can prevent bad modifications of variables that are broadcast. This allows for using broadcasting safely in more cases.

  • Can prevent modification of dataset coords via items (data arrays), which would unintentionally affect other data arrays in the dataset.

  • Can prevent bad mask updates in in-place binary operations without requiring mask dims to match data dims.

  • Can prevent silently dropping meta data in in-place binary operations on slices.

Negative:

No major downsides.

In rare cases users may want to get a data array from a dataset, item = ds['a'], and modify a coordinate without copying data. This would now require copying these coords by hand, e.g., item.coords['x'] = iten.coords['x'].copy(). In practice this should be a rare issue and users may just copy the entire item item = ds['a'].copy().