{ "cells": [ { "cell_type": "markdown", "id": "239c7ad4-44b0-4d8f-8dcb-6fa5833abc45", "metadata": {}, "source": [ "# Coordinate transformations\n", "\n", "## Motivation\n", "\n", "In all fields of science we frequently encounter data that is represented in coordinates or coordinate systems that are not apt for certain operations or visualizations.\n", "In these cases we may thus need to compute new coordinates based on one or multiple existing coordinates.\n", "For simple cases this may just be done by hand.\n", "Consider:" ] }, { "cell_type": "code", "execution_count": null, "id": "8b11dfb5-f9b9-43aa-8e56-418bcc0489a7", "metadata": {}, "outputs": [], "source": [ "import scipp as sc\n", "\n", "x = sc.linspace(dim='x', unit='m', start=1., stop=55., num=100)\n", "da = sc.DataArray(data=x * x, coords={'x': x})\n", "da.plot(figsize=(4, 3))" ] }, { "cell_type": "markdown", "id": "43d74a82-1f9c-48e3-a587-2d932288b532", "metadata": {}, "source": [ "We may want to use $x^2$ instead of $x$ as a coordinate, to highlight the quadratic nature of our data:" ] }, { "cell_type": "code", "execution_count": null, "id": "4ad57397-8d6c-46ac-a748-99207697734b", "metadata": {}, "outputs": [], "source": [ "da2 = da.copy()\n", "da2.coords['x_square'] = x * x\n", "da2" ] }, { "cell_type": "markdown", "id": "15677625-8898-40c7-a3fd-2d0c4395d612", "metadata": {}, "source": [ "While adding a new coordinate may often be done with a single line of code, the above example highlights the first shortcoming of this approach:\n", "To actually visualize `da` using this new coordinate we must additionally rename the dimension:" ] }, { "cell_type": "code", "execution_count": null, "id": "7b222310-40d6-4b95-92dd-9c2e7b7bcb63", "metadata": {}, "outputs": [], "source": [ "da2 = da2.rename_dims({'x': 'x_square'})\n", "da2.plot(figsize=(4, 3))" ] }, { "cell_type": "markdown", "id": "f7b31e14-10cc-4616-8cc6-f369a99357e8", "metadata": {}, "source": [ "Further complications are:\n", "\n", "- The original coordinate is preserved and may get in the way in subsequent operations.\n", "- Event-coordinates of binned data are not handled.\n", "- Multi-step conversions with multiple inputs and multiple outputs may be required in practice.\n", "\n", "To accommodate these recurring yet highly application-specific needs, scipp provides a generic mechanism for transforming coordinates.\n", "This is described and exemplified in the following.\n", "\n", "## `transform_coords`\n", "\n", "### Overview\n", "\n", "[sc.transform_coords](../generated/functions/scipp.transform_coords.rst#scipp.transform_coords) (also available as method of data arrays and datasets) is a tool for transforming one or more input coordinates into one or more output coordinates.\n", "It automatically handles:\n", "\n", "- Renaming of dimensions, if dimension-coordinates are transformed.\n", "- Change of coordinates to attributes to avoid interference of coordinates consumed by the transformation in follow-up operations.\n", "- Conversion of event-coordinates of binned data, if present.\n", "\n", "### Basic example\n", "\n", "We start by revisiting the example given in [Motivation](#Motivation).\n", "The building blocks `transform_coords` operates on are functions with named parameters.\n", "The parameter names define the names of the *input coordinates to consume*.\n", "Let us define `x_square`, which will *consume* `x`:" ] }, { "cell_type": "code", "execution_count": null, "id": "be50be41-9486-4943-99da-99653639ea98", "metadata": {}, "outputs": [], "source": [ "def x_square(x):\n", " return x * x" ] }, { "cell_type": "markdown", "id": "f22ec1c0-ad99-418b-820a-a6dad6fb82aa", "metadata": {}, "source": [ "Next, we create a `dict`, mapping from an output coord name to a function that can create this coordinate.\n", "The [sc.show_graph](../generated/functions/scipp.show_graph.rst#scipp.show_graph) helper is a convenient tool for visualizing the coordinate transformation defined by such as mapping:" ] }, { "cell_type": "code", "execution_count": null, "id": "33f5e9ef-35a8-49e6-9731-8b33a525c8fb", "metadata": {}, "outputs": [], "source": [ "graph = {'x^2': x_square}\n", "sc.show_graph(graph)" ] }, { "cell_type": "markdown", "id": "a658c4ce-ab11-41ed-84c5-af8df71838ab", "metadata": {}, "source": [ "Here, the `x` coordinate can be consumed by the `x_square` function, creating the `x^2` coordinate.\n", "Note that the function name and coordinate are unrelated.\n", "Next, we can call `transform_coords`.\n", "Apart from the graph, we also pass a list of desired output coordinates, here simply `['x^2']`.\n", "`transform_coords` returns a new (shallow-copied) data array with added coordinates:" ] }, { "cell_type": "code", "execution_count": null, "id": "ba462ced-c928-4f02-b60d-af8a602c03dd", "metadata": {}, "outputs": [], "source": [ "transformed = da.transform_coords(['x^2'], graph=graph)\n", "transformed" ] }, { "cell_type": "markdown", "id": "ab752aee-f2d7-4cae-8c7a-b33a2f7e8033", "metadata": {}, "source": [ "Note how `x` is now an attribute, i.e., operations will not use it for alignment anymore.\n", "This is important since it will allow for operations combining `transformed` with other data that may have matching `x^2` but not `x`.\n", "\n", "## Example: Multi-step transform splitting and combining input coords\n", "\n", "### Introduction\n", "\n", "Let us consider a more complex example.\n", "Imagine we have sensors around the globe, counting lightning strikes.\n", "For each sensor get have data recorded at a certain UTC, and the sensor location.\n", "We may be interested in variation of lightning strike frequency with time of day, as well as lattitude.\n", "To obtain this, we must:\n", "\n", "1. Extract lattitude and longitude information from the sensor locations.\n", "2. Compute the local datetime from the datetime and an \"timezone\" offset from the longitude.\n", "3. Extract the time from the local datetime.\n", "\n", "For this purpose, we may define functions that looks as follows.\n", "We suggest to the ignore the implementation details of these functions, since they are approximations and irrelevant for this example:" ] }, { "cell_type": "code", "execution_count": null, "id": "b96eeb16-1553-4af6-a3fa-39f40a909d2f", "metadata": {}, "outputs": [], "source": [ "def lat_long(location):\n", " x = location.fields.x\n", " y = location.fields.y\n", " z = location.fields.z\n", " theta = sc.to_unit(sc.atan2(y=sc.sqrt(x * x + y * y), x=z),\n", " 'deg',\n", " copy=False)\n", " phi = sc.to_unit(sc.atan2(y=y, x=x), 'deg', copy=False)\n", " return {'lattitude': 90.0 * sc.Unit('deg') - theta, 'longitude': phi}\n", "\n", "\n", "def local_datetime(datetime, longitude):\n", " long = sc.to_unit(longitude, unit='deg', copy=False)\n", " angular_velocity = (360.0 * sc.Unit('deg')) / (24.0 * sc.Unit('hour'))\n", " offset = (\n", " long /\n", " angular_velocity).astype('int64') + 12 * sc.Unit('hour')\n", " return sc.to_unit(offset, datetime.unit) + datetime\n", "\n", "\n", "def time(local_datetime):\n", " seconds_per_day = sc.scalar(24 * 60 * 60, unit='s/D')\n", " start_day = sc.scalar(start.value.astype('datetime64[D]'))\n", " start_day_in_seconds = sc.scalar(start_day.values.astype('datetime64[s]'))\n", " offset = local_datetime - start_day_in_seconds\n", " time = (offset % seconds_per_day).astype('float64')\n", " return time" ] }, { "cell_type": "markdown", "id": "9c299cce-43cf-40d8-8a57-a3f7a940c11f", "metadata": {}, "source": [ "### Defining a transformation graph\n", "\n", "Based on these functions we may then create a mapping between coordinate names and functions.\n", "The visualization of the graph gives a handy summary of the desired conversion outlined above:" ] }, { "cell_type": "code", "execution_count": null, "id": "fe191af1-db19-4b19-bd6d-c8fe5d9a45e9", "metadata": {}, "outputs": [], "source": [ "graph = {\n", " (\n", " 'longitude',\n", " 'lattitude',\n", " ): lat_long,\n", " 'local_time': time,\n", " 'local_datetime': local_datetime\n", "}\n", "sc.show_graph(graph, size='6')" ] }, { "cell_type": "markdown", "id": "53f32b45-8e1b-40c2-a17a-701c82e1680f", "metadata": {}, "source": [ "### Sample data\n", "\n", "Next, let us look at the data we are working with.\n", "Here we simply create some fake data, the details of the following code cell are irrelevant and should also be ignored:" ] }, { "cell_type": "code", "execution_count": null, "id": "df9a950c-a008-4bb1-8b72-a3530c0d642d", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "\n", "hour_steps = sc.arange(dim='datetime',\n", " dtype='int64',\n", " unit='s',\n", " start=0,\n", " stop=3 * 24 * 60 * 60,\n", " step=60 * 60)\n", "start = sc.scalar(np.datetime64('2021-06-01T17:00:00'))\n", "datetime = start + hour_steps\n", "nsite = 1000\n", "ntime = len(datetime)\n", "# Note that these points are NOT uniformly distributed on a sphere, this is NOT a good way to generate such points\n", "location = sc.vectors(dims=['location'], values=np.random.rand(\n", " nsite, 3)) - sc.vector(value=[.5, .5, .5])\n", "location *= 6371 * sc.Unit('km') / sc.norm(location)\n", "da = sc.DataArray(data=sc.array(dims=['location', 'datetime'],\n", " values=np.random.rand(nsite, ntime)),\n", " coords={\n", " 'location': location,\n", " 'datetime': datetime\n", " })\n", "da += 2. * (location.fields.z > 0. * sc.Unit('km')).astype(\n", " 'float64') # more lightning strikes in northern hemisphere\n", "phi0 = sc.atan2(y=location.fields.y, x=location.fields.x) - sc.to_unit(\n", " 90.0 * sc.Unit('deg'), 'rad')\n", "sin = sc.sin(\n", " phi0 +\n", " sc.linspace(dim='datetime', unit='rad', start=0, stop=6 * np.pi, num=ntime))\n", "da += 2 * (sin + 1) # more lightning strikes later in the day\n", "da.unit = 'counts'" ] }, { "cell_type": "markdown", "id": "7dd9c20d-14ca-48c7-8caf-86f2053ab063", "metadata": {}, "source": [ "Our input data looks as follows, a 2-D data array with dimensions `datetime` and `location`, and corresponding coordinates:" ] }, { "cell_type": "code", "execution_count": null, "id": "bcef672b-26b4-470f-aa76-32a718a4dabf", "metadata": {}, "outputs": [], "source": [ "da" ] }, { "cell_type": "markdown", "id": "3ae4fee2-c377-48ee-bab8-624bac8e9a0a", "metadata": {}, "source": [ "A 3-D scatter plot may be used to visualize this.\n", "When dragging the `datetime` slider we can observe how the lightning counts shifts around the globe with the time of the day (the fake data covers a period of 3 days).\n", "Note that the slider is only functional when running the notebook and is not functional in the online documentation page:" ] }, { "cell_type": "code", "execution_count": null, "id": "757be4c3-8efc-4550-89e4-231cb53d0230", "metadata": {}, "outputs": [], "source": [ "da.plot(projection='3d', positions='location')" ] }, { "cell_type": "markdown", "id": "ee1e4d51-ac76-435c-b147-b60e1c6db0e5", "metadata": {}, "source": [ "### Performing a transformation\n", "\n", "With this setup, the actual coordinate transformation is now very simple:" ] }, { "cell_type": "code", "execution_count": null, "id": "b295af8f-7158-4a5c-803b-4d7457eb9f64", "metadata": {}, "outputs": [], "source": [ "transformed = da.transform_coords(['lattitude', 'local_time'], graph=graph)" ] }, { "cell_type": "markdown", "id": "376d3c08-1692-4a84-a855-6e6c9f034c76", "metadata": {}, "source": [ "The result is:" ] }, { "cell_type": "code", "execution_count": null, "id": "db9b3705-f996-4a5b-a75c-63910fd181bf", "metadata": {}, "outputs": [], "source": [ "transformed" ] }, { "cell_type": "markdown", "id": "42e39ba7-8351-4cf0-9ce8-abd47bef6f17", "metadata": {}, "source": [ "In the above:\n", "\n", "- `lattitude` and `local_time` coordinates have been computed as requested.\n", "- The intermediate results `local_datetime` and `longitude` were preserved as attributes (use `keep_intermediate=False` to drop them).\n", "- The `location` and `datetime` coordinates (which have been consumed by the transformation) have been converted to attributes (use `keep_inputs=False` to drop them).\n", "- The `datetime` *dimension* has been consumed by the `local_time` coordinate and thus renamed to `local_time` (use `rename_dims=False` to disable).\n", "\n", "### Post-processing\n", "\n", "In some cases the above result may be all we need.\n", "Frequently however, we may need to resample or bin our data after this coordinate transformation.\n", "\n", "In the above case, `local_time` is now a 2-D coordinate, and the coordinate is not ordered since the \"date\" component of the datetime has been removed.\n", "We may thus want to bin this data into lattitude/local_time bins.\n", "Here we first use `flatten` with a dummy dimension to make the data suitable for `sc.bin`:" ] }, { "cell_type": "code", "execution_count": null, "id": "fd70c693-881d-4701-851b-c15441909496", "metadata": {}, "outputs": [], "source": [ "time_edges = sc.linspace(dim='local_time',\n", " unit='s',\n", " start=0,\n", " stop=24 * 60 * 60,\n", " num=6)\n", "lattitude = sc.linspace(dim='lattitude', unit='deg', start=-90, stop=90, num=13)\n", "binned = sc.bin(transformed.flatten(to='dummy'), edges=[lattitude, time_edges])" ] }, { "cell_type": "markdown", "id": "be960584-fa38-48f7-890a-6a85d480a9aa", "metadata": {}, "source": [ "The result looks as follows.\n", "If this was real data (the sample data is fake!) we might observe that there are more lightning strikes on the northern hemisphere as well as later in the day.\n", "This might be attributed to more thunderstorms after hot summer days.\n", "Note that this example does not represent reality and is merely meant to illustrate several concepts of `transform_coords`:" ] }, { "cell_type": "code", "execution_count": null, "id": "15cbd635-be46-44d8-a125-30ec8b97d1cd", "metadata": {}, "outputs": [], "source": [ "binned.plot(resolution={'y': 36, 'x': 24})" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" } }, "nbformat": 4, "nbformat_minor": 5 }