Representations and Tables#

Scipp provides a number of options for visualizing the structure and contents of variables, data arrays, and datasets:

  • scipp.to_html produces an HTML representation. This is also bound to _repr_html_, i.e., Jupyter will display this when the name of a Scipp object is typed at the end of a cell.

  • scipp.show draws an SVG representation of the contained items and their shapes.

  • scipp.table outputs a table representation of 1-D data.

  • str and repr produce a summary as a string.

String formatting is always possible, but the outputs of to_html, show, and table are designed for Jupyter notebooks.

While the outputs are mostly self-explanatory we discuss some details below.

HTML representation#

scipp.to_html is used to define _repr_html_. This special property is used by Jupyter in place of __repr__.

[1]:
import numpy as np
import scipp as sc
[2]:
x = sc.arange('x', 2.)
y = sc.arange('y', 4., unit='m')
labels = sc.arange('y', start=7., stop=10.)
ds = sc.Dataset(
    data={'a':sc.array(dims=['y', 'x'],
                       values=np.random.random((3, 2)),
                       variances=0.1 * np.random.random((3, 2)),
                       unit='angstrom')},
    coords={'x':x, 'y':y, 'y_label':labels})
ds['b'] = ds['a']
ds['c'] = 1.0 * sc.units.kg
ds['a'].attrs['x_attr'] = sc.array(dims=['x'], values=[1.77, 3.32])
ds['b'].attrs['x_attr'] = sc.array(dims=['x'], values=[55.7, 105.1])
ds['b'].attrs['b_attr'] = 1.2 * sc.units.m

Simply typing the name of a variable, data array, or dataset will show the HTML representation:

[3]:
ds
[3]:
Show/Hide data repr Show/Hide attributes
scipp.Dataset (4.26 KB)
    • y: 3
    • x: 2
    • x
      (x)
      float64
      πŸ™
      0.0, 1.0
      Values:
      array([0., 1.])
    • y
      (y [bin-edge])
      float64
      m
      0.0, 1.0, 2.0, 3.0
      Values:
      array([0., 1., 2., 3.])
    • y_label
      (y)
      float64
      πŸ™
      7.0, 8.0, 9.0
      Values:
      array([7., 8., 9.])
    • a
      (y, x)
      float64
      Γ…
      0.487, 0.363, ..., 0.864, 0.344
      Οƒ = 0.138, 0.301, ..., 0.244, 0.275
        • x_attr
          (x)
          float64
          πŸ™
          1.77, 3.32
          Values:
          array([1.77, 3.32])
      Values:
      array([[0.48730477, 0.36344038], [0.24857363, 0.31553676], [0.8641998 , 0.34448141]])

      Variances (σ²):
      array([[0.01914787, 0.09086229], [0.09020273, 0.00195045], [0.05945124, 0.0753859 ]])
    • b
      (y, x)
      float64
      Γ…
      0.487, 0.363, ..., 0.864, 0.344
      Οƒ = 0.138, 0.301, ..., 0.244, 0.275
        • b_attr
          ()
          float64
          m
          1.2
          Values:
          array(1.2)
        • x_attr
          (x)
          float64
          πŸ™
          55.7, 105.1
          Values:
          array([ 55.7, 105.1])
      Values:
      array([[0.48730477, 0.36344038], [0.24857363, 0.31553676], [0.8641998 , 0.34448141]])

      Variances (σ²):
      array([[0.01914787, 0.09086229], [0.09020273, 0.00195045], [0.05945124, 0.0753859 ]])
    • c
      ()
      float64
      kg
      1.0
      Values:
      array(1.)

The reported size is only an estimate. It includes the actual arrays of values as well as (some of) the internal memory used by variables, etc. See, e.g.Β scipp.Variable.underlying_size.

WARNING:

IPython (and thus Jupyter) has an Output caching system. By default this keeps the last 1000 cell outputs. In the above case this is ds (not the displayed HTML, but the object itself). If such cell outputs are large then this output cache can consume enormous amounts of memory.

Note that del ds will not release the memory, since the IPython output cache still holds a reference to the same object. See this FAQ entry for clearing or disabling this caching.

Note that (as usual) Jupyter only shows the last variable mentioned in a cell:

[4]:
a = 1
ds
a
[4]:
1

In this case, to_html can be used to retain the HTML view, e.g., to show multiple objects in a single cell:

[5]:
sc.to_html(ds['a'])
sc.to_html(ds['c'])
Show/Hide data repr Show/Hide attributes
scipp.DataArray (1.93 KB)
    • y: 3
    • x: 2
    • x
      (x)
      float64
      πŸ™
      0.0, 1.0
      Values:
      array([0., 1.])
    • y
      (y [bin-edge])
      float64
      m
      0.0, 1.0, 2.0, 3.0
      Values:
      array([0., 1., 2., 3.])
    • y_label
      (y)
      float64
      πŸ™
      7.0, 8.0, 9.0
      Values:
      array([7., 8., 9.])
    • (y, x)
      float64
      Γ…
      0.487, 0.363, ..., 0.864, 0.344
      Οƒ = 0.138, 0.301, ..., 0.244, 0.275
      Values:
      array([[0.48730477, 0.36344038], [0.24857363, 0.31553676], [0.8641998 , 0.34448141]])

      Variances (σ²):
      array([[0.01914787, 0.09086229], [0.09020273, 0.00195045], [0.05945124, 0.0753859 ]])
    • x_attr
      (x)
      float64
      πŸ™
      1.77, 3.32
      Values:
      array([1.77, 3.32])
Show/Hide data repr Show/Hide attributes
scipp.DataArray (776 Bytes)
    • ()
      float64
      kg
      1.0
      Values:
      array(1.)

Typing the Scipp module name at the end of a cell yields an HTML view of all Scipp objects (variables, data arrays, and datasets):

[6]:
sc
Variables:(3)
labels
Show/Hide data repr Show/Hide attributes
scipp.Variable (280 Bytes)
    • (y: 3)
      float64
      πŸ™
      7.0, 8.0, 9.0
      Values:
      array([7., 8., 9.])
x
Show/Hide data repr Show/Hide attributes
scipp.Variable (272 Bytes)
    • (x: 2)
      float64
      πŸ™
      0.0, 1.0
      Values:
      array([0., 1.])
y
Show/Hide data repr Show/Hide attributes
scipp.Variable (288 Bytes)
    • (y: 4)
      float64
      m
      0.0, 1.0, 2.0, 3.0
      Values:
      array([0., 1., 2., 3.])
DataArrays:(0)
Datasets:(1)
ds
Show/Hide data repr Show/Hide attributes
scipp.Dataset (4.26 KB)
    • y: 3
    • x: 2
    • x
      (x)
      float64
      πŸ™
      0.0, 1.0
      Values:
      array([0., 1.])
    • y
      (y [bin-edge])
      float64
      m
      0.0, 1.0, 2.0, 3.0
      Values:
      array([0., 1., 2., 3.])
    • y_label
      (y)
      float64
      πŸ™
      7.0, 8.0, 9.0
      Values:
      array([7., 8., 9.])
    • a
      (y, x)
      float64
      Γ…
      0.487, 0.363, ..., 0.864, 0.344
      Οƒ = 0.138, 0.301, ..., 0.244, 0.275
        • x_attr
          (x)
          float64
          πŸ™
          1.77, 3.32
          Values:
          array([1.77, 3.32])
      Values:
      array([[0.48730477, 0.36344038], [0.24857363, 0.31553676], [0.8641998 , 0.34448141]])

      Variances (σ²):
      array([[0.01914787, 0.09086229], [0.09020273, 0.00195045], [0.05945124, 0.0753859 ]])
    • b
      (y, x)
      float64
      Γ…
      0.487, 0.363, ..., 0.864, 0.344
      Οƒ = 0.138, 0.301, ..., 0.244, 0.275
        • b_attr
          ()
          float64
          m
          1.2
          Values:
          array(1.2)
        • x_attr
          (x)
          float64
          πŸ™
          55.7, 105.1
          Values:
          array([ 55.7, 105.1])
      Values:
      array([[0.48730477, 0.36344038], [0.24857363, 0.31553676], [0.8641998 , 0.34448141]])

      Variances (σ²):
      array([[0.01914787, 0.09086229], [0.09020273, 0.00195045], [0.05945124, 0.0753859 ]])
    • c
      ()
      float64
      kg
      1.0
      Values:
      array(1.)
[6]:
<module 'scipp' from '/home/runner/work/scipp/scipp/.tox/docs/lib/python3.8/site-packages/scipp/__init__.py'>

SVG representation#

scipp.show renders Scipp objects to an image that shows the relationships between coordinates and data. It should be noted that if a dimension extent is large, show will truncate it to avoid generation of massive and unreadable SVGs. Objects with more than three dimensions are not supported and will result in an error message.

Compare the image below with the HTML representation to see what the individual components represent. Names of dataset items and coordinates are shown in large letters. And dimension names are shown in smaller (rotated for y) letters.

[7]:
sc.show(ds)
aa(dims=('y', 'x'), shape=(3, 2), unit=Γ…, variances=True)variances yxvalues yx bb(dims=('y', 'x'), shape=(3, 2), unit=Γ…, variances=True)variances yxvalues yx y_lab..y_label(dims=('y',), shape=(3,), unit=dimensionless, variances=False)values y yy(dims=('y',), shape=(4,), unit=m, variances=False)values y cc(dims=(), shape=(), unit=kg, variances=False)values xx(dims=('x',), shape=(2,), unit=dimensionless, variances=False)values x

Note that y has four blocks and y_label and the data have 3 in the y-dimension. This indicates that y is a bin-edge coordinate.

scipp.show also works with binned data. Here, the smaller blocks to the right represent the events, i.e., the bin contents. Their length does not mean anything as the size of bins can vary.

[8]:
sc.show(sc.data.binned_xy(100, 3, 2))
(dims=('x', 'y'), shape=(3, 2), unit=None, variances=False)values xy (dims=('row',), shape=(100,), unit=K, variances=False)values row xx(dims=('row',), shape=(100,), unit=m, variances=False)values row yy(dims=('row',), shape=(100,), unit=m, variances=False)values row zz(dims=('row',), shape=(100,), unit=m, variances=False)values row xx(dims=('x',), shape=(4,), unit=m, variances=False)values x yy(dims=('y',), shape=(3,), unit=m, variances=False)values y

Table representation#

scipp.table arranges Scipp objects in a table. If only works with one-dimensional objects, so we have to use slicing to display our higher dimensional example:

[9]:
sc.table(ds['y', 0])
[9]:
ab
CoordinatesDataAttributesDataAttributes
x [πŸ™] [Γ…]x_attr [πŸ™] [Γ…]x_attr [πŸ™]
0.0000.487±0.1381.7700.487±0.13855.700
1.0000.363±0.3013.3200.363±0.301105.100

In the following, the y column is longer than the other columns because y is a bin-edge coordinate.

[10]:
sc.table(ds['x', 0])
[10]:
ab
CoordinatesDataData
y [m]y_label [πŸ™] [Γ…] [Γ…]
0.0007.0000.487±0.1380.487±0.138
1.0008.0000.249±0.3000.249±0.300
2.0009.0000.864±0.2440.864±0.244
3.000

String-representation#

All Scipp objects can be converted to strings:

[11]:
print(ds)
<scipp.Dataset>
Dimensions: Sizes[y:3, x:2, ]
Coordinates:
  x                         float64  [dimensionless]  (x)  [0, 1]
  y                         float64              [m]  (y [bin-edge])  [0, 1, 2, 3]
  y_label                   float64  [dimensionless]  (y)  [7, 8, 9]
Data:
  a                         float64             [Γ…]  (y, x)  [0.487305, 0.36344, ..., 0.8642, 0.344481]  [0.0191479, 0.0908623, ..., 0.0594512, 0.0753859]
    Attributes:
        x_attr                    float64  [dimensionless]  (x)  [1.77, 3.32]
  b                         float64             [Γ…]  (y, x)  [0.487305, 0.36344, ..., 0.8642, 0.344481]  [0.0191479, 0.0908623, ..., 0.0594512, 0.0753859]
    Attributes:
        b_attr                    float64              [m]  ()  [1.2]
        x_attr                    float64  [dimensionless]  (x)  [55.7, 105.1]
  c                         float64             [kg]  ()  [1]


In addition, Variables have a compact string format:

[12]:
print('{:c}'.format(ds['c'].data))
1.0 kg

Note that this is primarily intended for scalar variables and may produce hard to read outputs otherwise.