scipp.compat.pandas_compat.from_pandas#
- scipp.compat.pandas_compat.from_pandas(pd_obj, *, data_columns=None, include_trivial_index=False, header_parser=None)#
Converts a pandas.DataFrame or pandas.Series object into a scipp Dataset or DataArray respectively.
- Parameters:
pd_obj – The Dataframe or Series to convert.
data_columns (default:
None) – Select which columns to assign as data. The rest are returned as coordinates. IfNone, all columns are assigned as data. Use an empty list to assign all columns as coordinates.include_trivial_index (default:
False) –from_pandascan include the index of the data frame / series as a coordinate. But when the index isRangeIndex(start=0, stop=n, step=1), wherenis the length of the data frame / series, the index is excluded by default. Set this argument toTrueto include to index anyway in this case.header_parser (default:
None) –Parses each column header to extract a name and unit for each data array. By default, it returns the column name and uses the default unit. Builtin parsers can be specified by name:
"bracket": Seescipp.compat.pandas_compat.parse_bracket_header(). Parses strings where the unit is given between square brackets, i.e., strings likename [unit].
Before implementing a custom parser, check out
scipp.compat.pandas_compat.parse_bracket_header()to get an overview of how to handle edge cases.
- Returns:
The converted scipp object.
Examples
Convert a pandas Series to a DataArray:
>>> import scipp as sc >>> import pandas as pd >>> series = pd.Series([1.0, 2.0, 3.0], name='temperature [K]') >>> sc.compat.from_pandas(series, header_parser='bracket') <scipp.DataArray> Dimensions: Sizes[row:3, ] Data: temperature float64 [K] (row) [1, 2, 3]
Convert a pandas DataFrame to a Dataset, with all columns as data:
>>> df = pd.DataFrame({ ... 'x [m]': [1.0, 2.0, 3.0], ... 'y [m]': [4.0, 5.0, 6.0], ... 'temperature [K]': [273.0, 274.0, 275.0] ... }) >>> ds = sc.compat.from_pandas(df, header_parser='bracket') >>> ds <scipp.Dataset> Dimensions: Sizes[row:3, ] Data: temperature float64 [K] (row) [273, 274, 275] x float64 [m] (row) [1, 2, 3] y float64 [m] (row) [4, 5, 6]
Specify which columns should be data vs coordinates:
>>> ds = sc.compat.from_pandas(df, data_columns='temperature', header_parser='bracket') >>> ds <scipp.Dataset> Dimensions: Sizes[row:3, ] Coordinates: * x float64 [m] (row) [1, 2, 3] * y float64 [m] (row) [4, 5, 6] Data: temperature float64 [K] (row) [273, 274, 275]