Representations and Tables#

Scipp provides a number of options for visualizing the structure and contents of variables, data arrays, and datasets:

  • scipp.to_html produces an HTML representation. This is also bound to _repr_html_, i.e., Jupyter will display this when the name of a Scipp object is typed at the end of a cell.

  • scipp.show draws an SVG representation of the contained items and their shapes.

  • scipp.table outputs a table representation of 1-D data.

  • str and repr produce a summary as a string.

String formatting is always possible, but the outputs of to_html, show, and table are designed for Jupyter notebooks.

While the outputs are mostly self-explanatory we discuss some details below.

HTML representation#

scipp.to_html is used to define _repr_html_. This special property is used by Jupyter in place of __repr__.

[1]:
import numpy as np
import scipp as sc
[2]:
x = sc.arange('x', 2.0)
y = sc.arange('y', 4.0, unit='m')
labels = sc.arange('y', start=7.0, stop=10.0)
ds = sc.Dataset(
    data={
        'a': sc.array(
            dims=['y', 'x'],
            values=np.random.random((3, 2)),
            variances=0.1 * np.random.random((3, 2)),
            unit='angstrom',
        )
    },
    coords={'x': x, 'y': y, 'y_label': labels},
)
ds['b'] = ds['a']

Simply typing the name of a variable, data array, or dataset will show the HTML representation:

[3]:
ds
[3]:
Show/Hide data repr Show/Hide attributes
scipp.Dataset (2.71 KB)
    • y: 3
    • x: 2
    • x
      (x)
      float64
      𝟙
      0.0, 1.0
      Values:
      array([0., 1.])
    • y
      (y [bin-edge])
      float64
      m
      0.0, 1.0, 2.0, 3.0
      Values:
      array([0., 1., 2., 3.])
    • y_label
      (y)
      float64
      𝟙
      7.0, 8.0, 9.0
      Values:
      array([7., 8., 9.])
    • a
      (y, x)
      float64
      Å
      0.245, 0.370, ..., 0.236, 0.128
      σ = 0.269, 0.258, ..., 0.274, 0.213
      Values:
      array([[0.24537947, 0.37016922], [0.6246977 , 0.76919011], [0.23627559, 0.12848192]])

      Variances (σ²):
      array([[0.07220296, 0.06651781], [0.03961696, 0.01089015], [0.07486754, 0.0455633 ]])
    • b
      (y, x)
      float64
      Å
      0.245, 0.370, ..., 0.236, 0.128
      σ = 0.269, 0.258, ..., 0.274, 0.213
      Values:
      array([[0.24537947, 0.37016922], [0.6246977 , 0.76919011], [0.23627559, 0.12848192]])

      Variances (σ²):
      array([[0.07220296, 0.06651781], [0.03961696, 0.01089015], [0.07486754, 0.0455633 ]])

The columns are

  1. Name of the data item, coordinate, etc. For coordinates, a bold font indicates that the coordinate is aligned.

  2. Dimensions.

  3. DType.

  4. Unit.

  5. Values and variances.

The reported size is only an estimate. It includes the actual arrays of values as well as (some of) the internal memory used by variables, etc. See, e.g. scipp.Variable.underlying_size.

WARNING:

IPython (and thus Jupyter) has an Output caching system. By default this keeps the last 1000 cell outputs. In the above case this is ds (not the displayed HTML, but the object itself). If such cell outputs are large then this output cache can consume enormous amounts of memory.

Note that del ds will not release the memory, since the IPython output cache still holds a reference to the same object. See this FAQ entry for clearing or disabling this caching.

Note that (as usual) Jupyter only shows the last variable mentioned in a cell:

[4]:
a = 1
ds
a
[4]:
1

In this case, to_html can be used to retain the HTML view, e.g., to show multiple objects in a single cell:

[5]:
sc.to_html(ds['a'])
sc.to_html(ds['b'])
Show/Hide data repr Show/Hide attributes
scipp.DataArray (1.67 KB)
    • y: 3
    • x: 2
    • x
      (x)
      float64
      𝟙
      0.0, 1.0
      Values:
      array([0., 1.])
    • y
      (y [bin-edge])
      float64
      m
      0.0, 1.0, 2.0, 3.0
      Values:
      array([0., 1., 2., 3.])
    • y_label
      (y)
      float64
      𝟙
      7.0, 8.0, 9.0
      Values:
      array([7., 8., 9.])
    • (y, x)
      float64
      Å
      0.245, 0.370, ..., 0.236, 0.128
      σ = 0.269, 0.258, ..., 0.274, 0.213
      Values:
      array([[0.24537947, 0.37016922], [0.6246977 , 0.76919011], [0.23627559, 0.12848192]])

      Variances (σ²):
      array([[0.07220296, 0.06651781], [0.03961696, 0.01089015], [0.07486754, 0.0455633 ]])
Show/Hide data repr Show/Hide attributes
scipp.DataArray (1.67 KB)
    • y: 3
    • x: 2
    • x
      (x)
      float64
      𝟙
      0.0, 1.0
      Values:
      array([0., 1.])
    • y
      (y [bin-edge])
      float64
      m
      0.0, 1.0, 2.0, 3.0
      Values:
      array([0., 1., 2., 3.])
    • y_label
      (y)
      float64
      𝟙
      7.0, 8.0, 9.0
      Values:
      array([7., 8., 9.])
    • (y, x)
      float64
      Å
      0.245, 0.370, ..., 0.236, 0.128
      σ = 0.269, 0.258, ..., 0.274, 0.213
      Values:
      array([[0.24537947, 0.37016922], [0.6246977 , 0.76919011], [0.23627559, 0.12848192]])

      Variances (σ²):
      array([[0.07220296, 0.06651781], [0.03961696, 0.01089015], [0.07486754, 0.0455633 ]])

Typing the Scipp module name at the end of a cell yields an HTML view of all Scipp objects (variables, data arrays, and datasets):

[6]:
sc
Variables:(3)
labels
Show/Hide data repr Show/Hide attributes
scipp.Variable (280 Bytes)
    • (y: 3)
      float64
      𝟙
      7.0, 8.0, 9.0
      Values:
      array([7., 8., 9.])
x
Show/Hide data repr Show/Hide attributes
scipp.Variable (272 Bytes)
    • (x: 2)
      float64
      𝟙
      0.0, 1.0
      Values:
      array([0., 1.])
y
Show/Hide data repr Show/Hide attributes
scipp.Variable (288 Bytes)
    • (y: 4)
      float64
      m
      0.0, 1.0, 2.0, 3.0
      Values:
      array([0., 1., 2., 3.])
DataArrays:(0)
Datasets:(1)
ds
Show/Hide data repr Show/Hide attributes
scipp.Dataset (2.71 KB)
    • y: 3
    • x: 2
    • x
      (x)
      float64
      𝟙
      0.0, 1.0
      Values:
      array([0., 1.])
    • y
      (y [bin-edge])
      float64
      m
      0.0, 1.0, 2.0, 3.0
      Values:
      array([0., 1., 2., 3.])
    • y_label
      (y)
      float64
      𝟙
      7.0, 8.0, 9.0
      Values:
      array([7., 8., 9.])
    • a
      (y, x)
      float64
      Å
      0.245, 0.370, ..., 0.236, 0.128
      σ = 0.269, 0.258, ..., 0.274, 0.213
      Values:
      array([[0.24537947, 0.37016922], [0.6246977 , 0.76919011], [0.23627559, 0.12848192]])

      Variances (σ²):
      array([[0.07220296, 0.06651781], [0.03961696, 0.01089015], [0.07486754, 0.0455633 ]])
    • b
      (y, x)
      float64
      Å
      0.245, 0.370, ..., 0.236, 0.128
      σ = 0.269, 0.258, ..., 0.274, 0.213
      Values:
      array([[0.24537947, 0.37016922], [0.6246977 , 0.76919011], [0.23627559, 0.12848192]])

      Variances (σ²):
      array([[0.07220296, 0.06651781], [0.03961696, 0.01089015], [0.07486754, 0.0455633 ]])
DataGroups:(0)
[6]:
<module 'scipp' from '/home/runner/work/scipp/scipp/.tox/docs/lib/python3.10/site-packages/scipp/__init__.py'>

SVG representation#

scipp.show renders Scipp objects to an image that shows the relationships between coordinates and data. It should be noted that if a dimension extent is large, show will truncate it to avoid generation of massive and unreadable SVGs. Objects with more than three dimensions are not supported and will result in an error message.

Compare the image below with the HTML representation to see what the individual components represent. Names of dataset items and coordinates are shown in large letters. And dimension names are shown in smaller (rotated for y) letters.

[7]:
sc.show(ds)
aa(dims=('y', 'x'), shape=(3, 2), unit=Å, variances=True)variances yxvalues yx bb(dims=('y', 'x'), shape=(3, 2), unit=Å, variances=True)variances yxvalues yx y_lab..y_label(dims=('y',), shape=(3,), unit=dimensionless, variances=False)values y yy(dims=('y',), shape=(4,), unit=m, variances=False)values y xx(dims=('x',), shape=(2,), unit=dimensionless, variances=False)values x

Note that y has four blocks and y_label and the data have 3 in the y-dimension. This indicates that y is a bin-edge coordinate.

scipp.show also works with binned data. Here, the smaller blocks to the right represent the events, i.e., the bin contents. Their length does not mean anything as the size of bins can vary.

[8]:
sc.show(sc.data.binned_xy(100, 3, 2))
(dims=('x', 'y'), shape=(3, 2), unit=None, variances=False)values xy (dims=('row',), shape=(100,), unit=K, variances=False)values row xx(dims=('row',), shape=(100,), unit=m, variances=False)values row yy(dims=('row',), shape=(100,), unit=m, variances=False)values row zz(dims=('row',), shape=(100,), unit=m, variances=False)values row xx(dims=('x',), shape=(4,), unit=m, variances=False)values x yy(dims=('y',), shape=(3,), unit=m, variances=False)values y

Table representation#

scipp.table arranges Scipp objects in a table. If only works with one-dimensional objects, so we have to use slicing to display our higher dimensional example:

[9]:
sc.table(ds['y', 0])
[9]:
ab
CoordinatesDataData
x [𝟙] [Å] [Å]
0.0000.245±0.2690.245±0.269
1.0000.370±0.2580.370±0.258

In the following, the y column is longer than the other columns because y is a bin-edge coordinate.

[10]:
sc.table(ds['x', 0])
[10]:
ab
CoordinatesDataData
y [m]y_label [𝟙] [Å] [Å]
0.0007.0000.245±0.2690.245±0.269
1.0008.0000.625±0.1990.625±0.199
2.0009.0000.236±0.2740.236±0.274
3.000

String-representation#

All Scipp objects can be converted to strings:

[11]:
print(ds)
<scipp.Dataset>
Dimensions: Sizes[y:3, x:2, ]
Coordinates:
* x                         float64  [dimensionless]  (x)  [0, 1]
* y                         float64              [m]  (y [bin-edge])  [0, 1, 2, 3]
* y_label                   float64  [dimensionless]  (y)  [7, 8, 9]
Data:
  a                         float64             [Å]  (y, x)  [0.245379, 0.370169, ..., 0.236276, 0.128482]  [0.072203, 0.0665178, ..., 0.0748675, 0.0455633]
  b                         float64             [Å]  (y, x)  [0.245379, 0.370169, ..., 0.236276, 0.128482]  [0.072203, 0.0665178, ..., 0.0748675, 0.0455633]


The format of variables can be controlled using f-strings or format. For example, the default format shows the first 2 and last 2 elements:

[12]:
var = sc.linspace('x', 0.0, 1.0, 11, unit='m')
f'{var}'
[12]:
'<scipp.Variable> (x: 11)    float64              [m]  [0, 0.1, ..., 0.9, 1]'

Use < to show the first 4 elements:

[13]:
f'{var:<}'
[13]:
'<scipp.Variable> (x: 11)    float64              [m]  [0, 0.1, 0.2, 0.3, ...]'

Use #n to show n elements instead of 4:

[14]:
f'{var:#5}'
[14]:
'<scipp.Variable> (x: 11)    float64              [m]  [0, 0.1, ..., 0.8, 0.9, 1]'

Configure how elements are formatted. Note the double colon! The options after the first colon control how the variable itself is formatted. Options after the second are forwarded to the elements and can be anything that the element type (in this case float) supports.

[15]:
f'{var::.1e}'
[15]:
'<scipp.Variable> (x: 11)    float64              [m]  [0.0e+00, 1.0e-01, ..., 9.0e-01, 1.0e+00]'

Or combine all of the above:

[16]:
f'{var:<#5:.1e}'
[16]:
'<scipp.Variable> (x: 11)    float64              [m]  [0.0e+00, 1.0e-01, 2.0e-01, 3.0e-01, 4.0e-01, ...]'

In addition, Variables have a compact string format:

[17]:
var = sc.scalar(1.2345, variance=0.01, unit='kg')
f'{var:c}'
[17]:
'1.23(10) kg'

Note that this is primarily intended for scalar variables and may produce hard to read outputs otherwise.

Format string syntax#

The full syntax of format specifiers is:

format_spec ::= [scipp_spec] [":" nested_spec]
nested_spec ::= .*
scipp_spec  ::= [selection]["#" length][type]
selection   ::= "^" | "<" | ">"
length      ::= digit+
type        ::= "c"

``selection`` controls how the array is sliced:

selection

Meaning

^

Use elements from the beginning and end as if by var[:length//2], ..., var[-length//2:].

<

Use elements from the beginning as if by var[:length], ....

>

Use elements from the end as if by ..., var[-length].

None

Same as ^

``length`` controls how many elements are shown. It defaults to 4.

``type`` selects between different formatters:

type

Meaning

c

Compact formatter. Does not support other options like selection or nested_spec.

None

Default formatter which shows the variable with all metadata and data as determined by the other options.

``nested_spec`` is used to format the array elements. It can be anything that the dtype’s formatter supports. Note that it always requires an additional colon to separate it from the scipp_spec. See in particular the standard library specification.