System boundaries

Often we don’t want to show all of the data in one Sankey diagram: you focus on one part of the system. But we still want conservation of mass (or whatever is being shown in the diagram) to work, so we end up with flows to & from “elsewhere”. These can also be thought of as imports and exports.

Let’s start by recreating the Quickstart example:

[1]:
import pandas as pd
flows = pd.read_csv('simple_fruit_sales.csv')
[2]:
from floweaver import *

# Set the default size to fit the documentation better.
size = dict(width=570, height=300)

# Same partitions as the Quickstart tutorial
farms_with_other = Partition.Simple('process', [
    'farm1',
    'farm2',
    'farm3',
    ('other', ['farm4', 'farm5', 'farm6']),
])

customers_by_name = Partition.Simple('process', [
    'James', 'Mary', 'Fred', 'Susan'
])

# Define the nodes, this time setting the partition from the start
nodes = {
    'farms': ProcessGroup(['farm1', 'farm2', 'farm3',
                           'farm4', 'farm5', 'farm6'],
                          partition=farms_with_other),
    'customers': ProcessGroup(['James', 'Mary', 'Fred', 'Susan'],
                              partition=customers_by_name),
}

# Ordering and bundles as before
ordering = [
    ['farms'],       # put "farms" on the left...
    ['customers'],   # ... and "customers" on the right.
]

bundles = [
    Bundle('farms', 'customers'),
]
[3]:
sdd = SankeyDefinition(nodes, bundles, ordering)
weave(sdd, flows).to_widget(**size)

What happens if we remove farm2 from the ProcessGroup?

[4]:
nodes['farms'].selection = [
    'farm1', 'farm3', 'farm4', 'farm5', 'farm6'
]
weave(sdd, flows).to_widget(**size)

The flow is still there! But it is labelled with a little arrow to show that it is coming “from elsewhere”. This is important because we are still showing Susan and Fred in the diagram, and they get fruit from farm2. If we didn’t show those flows, Susan’s and Fred’s inputs and outputs would not balance.

Try now removing Susan and Fred from the diagram:

[5]:
nodes['customers'].selection = ['James', 'Mary']
weave(sdd, flows).to_widget(**size)

Now they’re gone, we no longer see the incoming flows from farm2. But we see some outgoing flows “to elsewhere” from farm3 and the other group. This is because farm3 is within the system boundary – it is shown in the diagram – so its output flow has to go somewhere.

Controlling Elsewhere flows

These flows are added automatically to make sure that mass is conserved, but because they are automatic, we have little control over them. By explicitly adding a flow to or from Elsewhere to the diagram, we can control where they appear and what they look like.

To do this, add a Waypoint for the outgoing flows to ‘pass through’ on their way across the system boundary:

[6]:
# Define a new Waypoint
nodes['exports'] = Waypoint(title='exports here')

# Update the ordering to include the waypoint
ordering = [
    ['farms'],                  #     put "farms" on the left...
    ['customers', 'exports'],   # ... and "exports" below "customers"
]                               #     on the right.

# Add a new bundle from "farms" to Elsewhere, via the waypoint
bundles = [
    Bundle('farms', 'customers'),
    Bundle('farms', Elsewhere, waypoints=['exports']),
]

sdd = SankeyDefinition(nodes, bundles, ordering)
weave(sdd, flows).to_widget(**size)

This is pretty similar to what we had already, but now the waypoint is explicitly listed as part of the SankeyDefinition, we have more control over it.

For example, we can put the exports above James and Mary by changing the ordering:

[7]:
ordering = [
    ['farms'],
    ['exports', 'customers'],
]
sdd = SankeyDefinition(nodes, bundles, ordering)
weave(sdd, flows).to_widget(**size)

Or we can partition the exports Waypoint to show how much of it is apples and bananas:

[8]:
fruits_by_type = Partition.Simple('type', ['apples', 'bananas'])
nodes['exports'].partition = fruits_by_type
weave(sdd, flows).to_widget(**size)

Horizontal bands

Often, import/exports and loss flows are shown in a separate horizontal “band” either above or below the main flows. We can do this by modifying the ordering a little bit.

The ordering style we have used so far looks like this:

ordering = [
    [list of nodes in layer 1],  # left-hand side
    [list of nodes in layer 2],
    ...
    [list of nodes in layer N],  # right-hand side
]

But we can add another layer of nesting to make it look like this:

ordering = [
    # |top band|  |bottom band|
    [ [........], [...........] ],  # left-hand side
    [ [........], [...........] ],
    ...
    [ [........], [...........] ],  # right-hand side
]

Here’s an example:

[9]:
ordering = [
    [[],          ['farms'    ]],
    [['exports'], ['customers']],
]
sdd = SankeyDefinition(nodes, bundles, ordering)
weave(sdd, flows).to_widget(**size)

Summary

  • All the flows to/from a ProcessGroup are shown, even if the other end of the flow is outside the system boundary (i.e. not part of any ProcessGroup)

  • You can control the automatic flows by explicitly adding Bundles to/from Elsewhere with a Waypoint

  • The ordering can contain horizontal bands