{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Colour-intensity scales\n", "\n", "In this tutorial we will look at how to use colours in the Sankey diagram. We have already seen how to use a palette, but in this tutorial we will also create a Sankey where the intensity of the colour is proportional to a numerical value." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First step is to import all the required packages and data:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "from floweaver import *\n", "\n", "df1 = pd.read_csv('holiday_data.csv')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now take a look at the dataset we are using. This is a very insightful [made-up] dataset about how different types of people lose weight while on holiday enjoying themselves." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dataset = Dataset(df1)\n", "df1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now define the partitions of the data. Rather than listing the categories by hand, we use `np.unique` to pick out a list of the unique values that occur in the dataset." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "partition_job = Partition.Simple('Employment Job', np.unique(df1['Employment Job']))\n", "partition_activity = Partition.Simple('Activity', np.unique(df1['Activity']))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In fact, this is pretty common so there is a built-in function to do this:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# these statements or the ones above do the same thing\n", "partition_job = dataset.partition('Employment Job')\n", "partition_activity = dataset.partition('Activity')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We then go on to define the structure of our sankey. We define nodes, bundles and the order. In this case its pretty straightforward:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nodes = {\n", " 'Activity': ProcessGroup(['Activity'], partition_activity),\n", " 'Job': ProcessGroup(['Employment Job'], partition_job),\n", "}\n", "\n", "bundles = [\n", " Bundle('Activity', 'Job'),\n", "]\n", "\n", "ordering = [\n", " ['Activity'],\n", " ['Job'],\n", "]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we will plot a Sankey that shows the share of time dedicated to each activity by each type of person. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# These are the same each time, so just write them here once\n", "size_options = dict(width=500, height=400,\n", " margins=dict(left=100, right=100))\n", "\n", "sdd = SankeyDefinition(nodes, bundles, ordering)\n", "weave(sdd, dataset, measures='Calories Burnt').to_widget(**size_options)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can start using colour by specifying that we want to partition the flows according to type of person. Notice that this time we are using a pre-determined palette. \n", "\n", "You can find all sorts of palettes [listed here](https://jiffyclub.github.io/palettable/colorbrewer/qualitative/)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sdd = SankeyDefinition(nodes, bundles, ordering, flow_partition=partition_job)\n", "\n", "weave(sdd, dataset, palette='Set2_8', measures='Calories Burnt').to_widget(**size_options)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, if we want to make the colour of the flow to be proportional to a numerical value." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "weave(sdd, dataset, link_color=QuantitativeScale('Calories Burnt'), measures='Calories Burnt').to_widget(**size_options)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It's more interesting to use colour to show a different attribute from the flow table. But because a line in the Sankey diagram is an aggregation of multiple flows in the original data, we need to specify how the new dimension will be aggregated. For example, we'll use the *mean* of the flows within each Sankey link to set the colour. In this case we will use the colour to show how much each type of person emjoys each activity. We can be interested in either the cumulative enjoyment, or the mean enjoyment: try both!\n", "\n", "Aggregation is specified with the `measures` parameter, which should be set to a dictionary mapping dimension names to aggregation functions (`'mean'`, `'sum'` etc)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "weave(sdd, dataset, measures={'Calories Burnt': 'sum', 'Enjoyment': 'mean'}, link_width='Calories Burnt',\n", " link_color=QuantitativeScale('Enjoyment')).to_widget(**size_options)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "weave(sdd, dataset, measures={'Calories Burnt': 'sum', 'Enjoyment': 'mean'}, link_width='Calories Burnt',\n", " link_color=QuantitativeScale('Enjoyment', intensity='Calories Burnt')).to_widget(**size_options)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can change the colour palette using the `palette` attribute. The palette names are different from before, because those were *categorical* (or *qualitative*) scales, and this is now a *sequential* scale. The palette names are [listed here](https://jiffyclub.github.io/palettable/colorbrewer/sequential/)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "scale = QuantitativeScale('Enjoyment', palette='Blues_9')\n", "weave(sdd, dataset, \n", " measures={'Calories Burnt': 'sum', 'Enjoyment': 'mean'}, \n", " link_width='Calories Burnt',\n", " link_color=scale) \\\n", " .to_widget(**size_options)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "scale.domain" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is possible to create a colorbar / scale to show the range of intensity values, but it's not currently as easy as it should be. This should be improved in future." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## More customisation\n", "\n", "You can subclass the QuantitativeScale class to get more control over the colour scale." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class MyScale(QuantitativeScale):\n", " def get_palette(self, link):\n", " # Choose colour scheme based on link type (here, Employment Job)\n", " name = 'Greens_9' if link.type == 'Student' else 'Blues_9'\n", " return self.lookup_palette_name(name)\n", " \n", " def get_color(self, link, value):\n", " palette = self.get_palette(link)\n", " return palette(0.2 + 0.8*value)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_scale = MyScale('Enjoyment', palette='Blues_9')\n", "weave(sdd, dataset, \n", " measures={'Calories Burnt': 'sum', 'Enjoyment': 'mean'}, \n", " link_width='Calories Burnt',\n", " link_color=my_scale) \\\n", " .to_widget(**size_options)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Or, maybe you want to hide the smallest flows:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class DimmingScale(QuantitativeScale):\n", " def __init__(self, attr, threshold, **kwargs):\n", " super().__init__(attr)\n", " self.threshold = threshold\n", " \n", " def get_color(self, link, normalised_value):\n", " if normalised_value < self.threshold:\n", " return '#ddd'\n", " return super().get_color(link, normalised_value)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_scale2 = DimmingScale('Calories Burnt', threshold=0.3, palette='Blues_9')\n", "w = weave(sdd, dataset, measures='Calories Burnt', link_color=my_scale2) \\\n", " .to_widget(**size_options)\n", "w" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Just for fun, you can adjust the threshold interactively:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from ipywidgets import interact\n", "\n", "@interact(threshold=(0.0, 1.0, 0.1))\n", "def update_threshold(threshold=0.3):\n", " my_scale2.threshold = threshold\n", " w_new = weave(sdd, dataset, measures='Calories Burnt', link_color=my_scale2) \\\n", " .to_widget(**size_options)\n", " w.links = w_new.links" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This colour scale decides whether to choose a grey colour based on the normalised value (within a range of 0 to 1) which is used to lookup a colour in the colour scale.\n", "\n", "Alternatively, you could intervene based on the absolute value:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class DimmingScaleAbsolute(QuantitativeScale):\n", " def __init__(self, attr, threshold, **kwargs):\n", " super().__init__(attr)\n", " self.threshold = threshold\n", " \n", " def __call__(self, link, measures):\n", " value = self.get_value(link, measures)\n", " if value < self.threshold:\n", " return '#ddd'\n", " return super().__call__(link, measures)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_scale3 = DimmingScaleAbsolute('Calories Burnt', threshold=2, palette='Blues_9')\n", "weave(sdd, dataset, measures='Calories Burnt', link_color=my_scale3) \\\n", " .to_widget(**size_options)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A similar approach can be used with a `CategoricalScale` as well as a `QuantitativeScale`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class DimmingCategoricalScale(CategoricalScale):\n", " def __init__(self, attr, threshold_measure, threshold_value, **kwargs):\n", " \"\"\"Acts like CategoricalScale unless threshold_measure is below threshold_value.\"\"\"\n", " super().__init__(attr)\n", " self.threshold_measure = threshold_measure\n", " self.threshold_value = threshold_value\n", " \n", " def __call__(self, link, measures):\n", " value = measures[self.threshold_measure]\n", " if value < self.threshold_value:\n", " return '#ddd'\n", " return super().__call__(link, measures)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "my_scale3 = DimmingCategoricalScale(\n", " 'type',\n", " threshold_measure='Calories Burnt', \n", " threshold_value=6, \n", " palette='Blues_9'\n", ")\n", "weave(sdd, dataset, measures='Calories Burnt', link_color=my_scale3) \\\n", " .to_widget(**size_options)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.12" } }, "nbformat": 4, "nbformat_minor": 2 }