Colour-intensity scales¶

In this tutorial we will look at how to use colours in the Sankey diagram. We have already seen how to use a palette, but in this tutorial we will also create a Sankey where the intensity of the colour is proportional to a numerical value.

First step is to import all the required packages and data:

[1]:

import pandas as pd
import numpy as np
from floweaver import *

df1 = pd.read_csv('holiday_data.csv')

Now take a look at the dataset we are using. This is a very insightful [made-up] dataset about how different types of people lose weight while on holiday enjoying themselves.

[2]:

dataset = Dataset(df1)
df1

[2]:

	source	target	Calories Burnt	Enjoyment	Employment Job	Activity
0	Activity	Employment Job	2.5	35	Student	Reading
1	Activity	Employment Job	4.5	20	Student	Swimming
2	Activity	Employment Job	8.0	5	Student	Sleeping
3	Activity	Employment Job	1.0	5	Student	Travelling
4	Activity	Employment Job	8.0	30	Student	Working out
5	Activity	Employment Job	1.0	35	Trainee	Reading
6	Activity	Employment Job	3.0	40	Trainee	Travelling
7	Activity	Employment Job	2.0	40	Trainee	Swimming
8	Activity	Employment Job	6.0	5	Trainee	Sleeping
9	Activity	Employment Job	12.0	45	Trainee	Working out
10	Activity	Employment Job	4.5	20	Administrator	Swimming
11	Activity	Employment Job	9.0	10	Administrator	Sleeping
12	Activity	Employment Job	7.5	50	Administrator	Working out
13	Activity	Employment Job	1.5	35	Administrator	Reading
14	Activity	Employment Job	1.5	50	Administrator	Travelling
15	Activity	Employment Job	11.0	55	Manager	Working out
16	Activity	Employment Job	2.0	45	Manager	Reading
17	Activity	Employment Job	7.5	10	Manager	Sleeping
18	Activity	Employment Job	1.5	90	Manager	Travelling
19	Activity	Employment Job	2.0	40	Manager	Swimming
20	Activity	Employment Job	3.0	35	Pensioner	Reading
21	Activity	Employment Job	9.0	15	Pensioner	Swimming
22	Activity	Employment Job	9.0	15	Pensioner	Sleeping
23	Activity	Employment Job	3.0	60	Pensioner	Travelling
24	Activity	Employment Job	0.0	0	Pensioner	Working out

We now define the partitions of the data. Rather than listing the categories by hand, we use np.unique to pick out a list of the unique values that occur in the dataset.

[3]:

partition_job = Partition.Simple('Employment Job', np.unique(df1['Employment Job']))
partition_activity = Partition.Simple('Activity', np.unique(df1['Activity']))

In fact, this is pretty common so there is a built-in function to do this:

[4]:

# these statements or the ones above do the same thing
partition_job = dataset.partition('Employment Job')
partition_activity = dataset.partition('Activity')

We then go on to define the structure of our sankey. We define nodes, bundles and the order. In this case its pretty straightforward:

[5]:

nodes = {
    'Activity': ProcessGroup(['Activity'], partition_activity),
    'Job': ProcessGroup(['Employment Job'], partition_job),
}

bundles = [
    Bundle('Activity', 'Job'),
]

ordering = [
    ['Activity'],
    ['Job'],
]

Now we will plot a Sankey that shows the share of time dedicated to each activity by each type of person.

[6]:

# These are the same each time, so just write them here once
size_options = dict(width=500, height=400,
                    margins=dict(left=100, right=100))

sdd = SankeyDefinition(nodes, bundles, ordering)
weave(sdd, dataset, measures='Calories Burnt').to_widget(**size_options)

[6]:

We can start using colour by specifying that we want to partition the flows according to type of person. Notice that this time we are using a pre-determined palette.

You can find all sorts of palettes listed here.

[7]:

sdd = SankeyDefinition(nodes, bundles, ordering, flow_partition=partition_job)

weave(sdd, dataset, palette='Set2_8', measures='Calories Burnt').to_widget(**size_options)

[7]:

Now, if we want to make the colour of the flow to be proportional to a numerical value.

[8]:

weave(sdd, dataset, link_color=QuantitativeScale('Calories Burnt'), measures='Calories Burnt').to_widget(**size_options)

[8]:

It’s more interesting to use colour to show a different attribute from the flow table. But because a line in the Sankey diagram is an aggregation of multiple flows in the original data, we need to specify how the new dimension will be aggregated. For example, we’ll use the mean of the flows within each Sankey link to set the colour. In this case we will use the colour to show how much each type of person emjoys each activity. We can be interested in either the cumulative enjoyment, or the mean enjoyment: try both!

Aggregation is specified with the measures parameter, which should be set to a dictionary mapping dimension names to aggregation functions ('mean', 'sum' etc).

[9]:

weave(sdd, dataset, measures={'Calories Burnt': 'sum', 'Enjoyment': 'mean'}, link_width='Calories Burnt',
      link_color=QuantitativeScale('Enjoyment')).to_widget(**size_options)

[9]:

[10]:

weave(sdd, dataset, measures={'Calories Burnt': 'sum', 'Enjoyment': 'mean'}, link_width='Calories Burnt',
      link_color=QuantitativeScale('Enjoyment', intensity='Calories Burnt')).to_widget(**size_options)

/home/docs/checkouts/readthedocs.org/user_builds/floweaver/checkouts/latest/src/floweaver/color_scales.py:136: RuntimeWarning: invalid value encountered in scalar divide
  value /= measures[self.intensity]

[10]:

You can change the colour palette using the palette attribute. The palette names are different from before, because those were categorical (or qualitative) scales, and this is now a sequential scale. The palette names are listed here.

[11]:

scale = QuantitativeScale('Enjoyment', palette='Blues_9')
weave(sdd, dataset,
      measures={'Calories Burnt': 'sum', 'Enjoyment': 'mean'},
      link_width='Calories Burnt',
      link_color=scale) \
    .to_widget(**size_options)

[11]:

[12]:

scale.domain

[12]:

(np.float64(0.0), np.float64(90.0))

It is possible to create a colorbar / scale to show the range of intensity values, but it’s not currently as easy as it should be. This should be improved in future.

More customisation¶

You can subclass the QuantitativeScale class to get more control over the colour scale.

[13]:

class MyScale(QuantitativeScale):
    def get_palette(self, link):
        # Choose colour scheme based on link type (here, Employment Job)
        name = 'Greens_9' if link.type == 'Student' else 'Blues_9'
        return self.lookup_palette_name(name)

    def get_color(self, link, value):
        palette = self.get_palette(link)
        return palette(0.2 + 0.8*value)

[14]:

my_scale = MyScale('Enjoyment', palette='Blues_9')
weave(sdd, dataset,
      measures={'Calories Burnt': 'sum', 'Enjoyment': 'mean'},
      link_width='Calories Burnt',
      link_color=my_scale) \
    .to_widget(**size_options)

[14]:

Or, maybe you want to hide the smallest flows:

[15]:

class DimmingScale(QuantitativeScale):
    def __init__(self, attr, threshold, **kwargs):
        super().__init__(attr)
        self.threshold = threshold

    def get_color(self, link, normalised_value):
        if normalised_value < self.threshold:
            return '#ddd'
        return super().get_color(link, normalised_value)

[16]:

my_scale2 = DimmingScale('Calories Burnt', threshold=0.3, palette='Blues_9')
w = weave(sdd, dataset, measures='Calories Burnt', link_color=my_scale2) \
    .to_widget(**size_options)
w

[16]:

Just for fun, you can adjust the threshold interactively:

[17]:

from ipywidgets import interact

@interact(threshold=(0.0, 1.0, 0.1))
def update_threshold(threshold=0.3):
    my_scale2.threshold = threshold
    w_new = weave(sdd, dataset, measures='Calories Burnt', link_color=my_scale2) \
        .to_widget(**size_options)
    w.links = w_new.links

This colour scale decides whether to choose a grey colour based on the normalised value (within a range of 0 to 1) which is used to lookup a colour in the colour scale.

Alternatively, you could intervene based on the absolute value:

[18]:

class DimmingScaleAbsolute(QuantitativeScale):
    def __init__(self, attr, threshold, **kwargs):
        super().__init__(attr)
        self.threshold = threshold

    def __call__(self, link, measures):
        value = self.get_value(link, measures)
        if value < self.threshold:
            return '#ddd'
        return super().__call__(link, measures)

[19]:

my_scale3 = DimmingScaleAbsolute('Calories Burnt', threshold=2, palette='Blues_9')
weave(sdd, dataset, measures='Calories Burnt', link_color=my_scale3) \
    .to_widget(**size_options)

[19]:

A similar approach can be used with a CategoricalScale as well as a QuantitativeScale:

[20]:

class DimmingCategoricalScale(CategoricalScale):
    def __init__(self, attr, threshold_measure, threshold_value, **kwargs):
        """Acts like CategoricalScale unless threshold_measure is below threshold_value."""
        super().__init__(attr)
        self.threshold_measure = threshold_measure
        self.threshold_value = threshold_value

    def __call__(self, link, measures):
        value = measures[self.threshold_measure]
        if value < self.threshold_value:
            return '#ddd'
        return super().__call__(link, measures)

[21]:

my_scale3 = DimmingCategoricalScale(
    'type',
    threshold_measure='Calories Burnt',
    threshold_value=6,
    palette='Blues_9'
)
weave(sdd, dataset, measures='Calories Burnt', link_color=my_scale3) \
    .to_widget(**size_options)

[21]:

Colour-intensity scales¶

More customisation¶

Navigation

Related Topics