Plotly Recipes: a Highlighted Line Graph
How to identify the data you are interested in, but keep the context — a simple library function.
I was tempted to say that the image above on the left was the data visualisation equivalent of not being able to see the wood for the trees. But that’s not right.
If the image on the left represents the wood, then it’s the individual traces (the trees) that are obscured. The trees are only properly visible when we reduce the number that we look at and push the rest of the wood into the background.
When faced with a dense dataset like this, we can highlight the data we are interested in with a fairly simple Plotly Graph Objects function. And by reducing the visibility of the bulk of the data, but without eliminating it completely, we can focus on the data we are interested in while keeping the context.
The images above show that the world’s population growth rate is slowing because the average number of babies born per woman is decreasing. This is good news because, assuming the trend continues, the World population will peak at some point in the future rather than continue to grow and overwhelm the planet.
The charts are inspired by the very first data visualisation exercise in Alberto Cairo’s book, The Functional Art¹. In the book, he talks about his fascination with Matt Ridley’s arguments in The Rational Optimist: How Prosperity Evolves², Ridley argues that the World population will peak in a few decades because, globally, the fertility rate will converge to around 2.1 as countries become more wealthy. (A replacement rate of 2.1 babies per woman produces a steady population; a higher rate will cause an increase in population, while a lower one means the population will shrink and get proportionately older.)
To investigate, Cairo turned to the UNData³ website, where he found historical data on population growth. Using this data in a spreadsheet, he came up with a graph similar to the ‘Before’ image above. You can just about detect a trend of lowering fertility rate (the top left is dense with lines, and this density moves down the graph over time), but otherwise, it is difficult to interpret.
As Cairo points out, you might just be able to follow the line representing a single country, but it is more likely that you will give up. To accurately trace the rate in a single country, you need something more like the ‘After’ image, where specific countries are highlighted, and it is easy to trace the movement in fertility.
To produce this sort of graph in 2013, Cairo needed to resort to editing the image by hand, first making all the traces grey, and then highlighting his chosen countries in colour — a painstaking task which we can make much easier using Plotly. (Plotly was just about launching as Cairo wrote his book and was not yet the powerful tool we know today. Though, admittedly, Matplotlib, which could also be used, has been around a lot longer.)
The Plotly Solution
Plotly comes in two flavours: the preferred library is Plotly Express, which gives us a wide range of chart types we can easily program. Plotly Graph Objects (GO) is a more complex but flexible library that provides us with more control over visualisations. Plotly Express charts are wrappers around GO functions, and it makes sense to use Plotly Express for the charts it supports.
However, while we can create the ‘Before’ chart easily using Plotly Express, we need GO to construct the ‘After’ version.
We’ll use the same sort of data that Alberto Cairo used. I downloaded data from Our World in Data⁴ that is derived from the UNData website. I then created a subset that we will use here. It tracks the fertility rate for the World and a large range of countries from 1950 to 2023.
My subset is a simple table that records only the fertility rate.
Below you can see the first 4 entries for the World, and the fertility rate is around 5 children per woman. This implies a significant worldwide increase in population year by year.
Looking at the last few entries for the World, it is clear that the rate of increase in the population has shrunk considerably — the fertility rate has more than halved — giving credence to Ridley’s prediction.
The table contains not only data for the World but for many (all?) countries, too. Here is a simple function to plot all of the data for all countries.
px.line(x=df_rate['Year'], y=df_rate['Fertility Rate'], color=df_rate['Region'],
height=800, width=1600, template = 'plotly_white' )
I ran this in a Jupyter Notebook, which resulted in the ‘Before’ graph that you can see more clearly below.
As I said earlier, you can see a lowering in fertility rates over time — the number of countries that have maintained a high rate is clearly declining. If we were to plot the data for the World only, we would get the following:
A very clear downward trend.
If we want to plot the data for a subset of countries, we can use Plotly Express, as before, with that subset of data.
df2 = df_rate[df_rate['Region'].isin(['World','Spain','Brazil'])]
px.line(x=df2['Year'], y=df2['Fertility Rate'], color=df2['Region'],
height=600, width=800, template = 'plotly_white' )
This code gives us the following chart (with the countries that were of interest to Alberto Cairo).
This is fine if that is what we are interested in. This chart shows that the fertility rate is dropping in the two additional countries. In Brazil, the change is very marked dropping from around 6 babies per woman in 1950 to less than 2 in 2023.
In these plots we see the data we want very clearly, however, the context provided by all the other countries is lost. On the other hand, if we include all the other data the clarity would be lost.
The answer is to create two plots, one that plots all of the data but where the traces are subdued and a second that highlights the countries of interest. We can then combine the two plots to create the ‘After’ plot.
I’ve written a reusable function to do this.
def highlight_chart(df,
x_value,
y_value,
trace_name,
highlight_traces,
xaxis_title="", yaxis_title="", chart_name="",
width=1600, height=800, template='plotly_white'):
"""
Generates a Plotly figure with highlighted traces.
Args:
df (pd.DataFrame): The input DataFrame containing the data.
x_value (str): The name of the column to use for the x-axis.
y_value (str): The name of the column to use for the y-axis.
trace_name (str): The name of the column that identifies different traces (e.g., countries).
highlight_traces (list): A list of trace names to highlight with different colors.
xaxis_title (str, optional): Title for the x-axis. Defaults to "".
yaxis_title (str, optional): Title for the y-axis. Defaults to "".
chart_name (str, optional): Title for the chart. Defaults to "".
width (int, optional): Width of the chart in pixels. Defaults to 1600.
height (int, optional): Height of the chart in pixels. Defaults to 800.
template (str, optional): Plotly template to use. Defaults to 'plotly_white'.
Returns:
plotly.graph_objects.Figure: The generated Plotly figure.
"""
# Create the base figure
fig = go.Figure()
# Plot all traces in grey
for t_name in df[trace_name].unique():
t_df = df[df[trace_name] == t_name]
fig.add_trace(go.Scatter(
x=t_df[x_value],
y=t_df[y_value],
mode='lines',
line=dict(color='#E8E8E8', width=1),
showlegend=False # Hide legend for grey lines
))
# Overlay selected countries in color
colors = px.colors.qualitative.Plotly
for t_name, color in zip(highlight_traces, colors):
t_df = df[df[trace_name] == t_name]
fig.add_trace(go.Scatter(
x=t_df[x_value],
y=t_df[y_value],
mode='lines',
name=t_name,
line=dict(color=color, width=4)
))
# Update layout
fig.update_layout(
title=chart_name,
xaxis_title=xaxis_title,
yaxis_title=yaxis_title,
height = height,
width = width,
template=template
)
return fig
Essentially, this function does three things: first a GO figure is created, second there is a loop that plots a line graph for each country in a light grey colour each of which is added to the GO figure, and third, another loop plots lines for each of the countries of interest but this time in colour, these are also added to the GO figure. The lines representing the countries of interest are also a little thicker.
It is intended to be a generic function and has several parameters that are documented in the code. Here is how we would use it to produce the ‘After’ plot as a Streamlit program:
df = pd.read_csv('rate.csv', sep=';')
chart_name='Fertility Rate by Country'
xaxis_title='Year'
yaxis_title='Fertiity Rate'
template='simple_white'
# column names
y_value = 'Fertility Rate'
x_value = 'Year'
trace_name = 'Region'
# List of countries to highlight
highlight_traces = ['World','Spain','India','Brazil','United States of America']
fig = highlight_chart(df,x_value=x_value, y_value=y_value,
trace_name=trace_name,
highlight_traces=highlight_traces,
xaxis_title=xaxis_title,
yaxis_title=yaxis_title,
chart_name=chart_name)
st.plotly_chart(fig, use_container_width=True)
As you can see from the code, I have chosen a handful of countries that all show a decline in fertility rate. Indeed, every country represented in the data shows a decline to at least some extent (except for the Holy See, i.e. the Vatican — no prizes for guessing why a small Catholic enclave of priests and nuns might not show much variation in fertility rate).
The plot that is produced can be seen below.
This is the result we are looking for: the background of grey lines gives us a good overall impression of what is happening from a worldwide perspective and the coloured lines show more precisely the situation in the countries that we have selected.
The code for a more functional Streamlit app using a selector to choose the list of countries to be displayed can be seen below. Replace the code below the function definition with this code.
# Load the data
df = pd.read_csv('rate.csv', sep=';')
chart_name='Fertility Rate by Country'
xaxis_title='Year'
yaxis_title='Fertiity Rate' # Typo: Should be 'Fertility Rate'
template='simple_white'
# column names
y_value = 'Fertility Rate'
x_value = 'Year'
trace_name = 'Region'
# Get unique regions for the selector
all_regions = sorted(df[trace_name].unique())
# Default list of countries to highlight
default_highlight_traces = ['World','Spain','India','Brazil','United States of America']
# Streamlit multiselect widget for highlighting traces
st.sidebar.header("Chart Options") # Optional: Add a header in the sidebar
highlight_traces = st.sidebar.multiselect(
"Select regions to highlight:",
options=all_regions,
default=[region for region in default_highlight_traces if region in all_regions] # Ensure defaults are valid
)
# Corrected yaxis_title
yaxis_title='Fertility Rate'
if highlight_traces: # Only generate chart if at least one region is selected
fig = highlight_chart(df,x_value=x_value, y_value=y_value,
trace_name=trace_name,
highlight_traces=highlight_traces,
xaxis_title=xaxis_title,
yaxis_title=yaxis_title,
chart_name=chart_name,
template=template) # Pass the template
st.plotly_chart(fig, use_container_width=True)
else:
st.info("Please select at least one region to highlight from the sidebar.")
Here is a screenshot of the app.
The parameters allow the function to be used with any similar dataset, so you can try it out with your own data.
We have seen that when data is dense, it can be difficult to identify individual trends and while it is easy to pick out individual traces by filtering the data, this can obscure the bigger picture.
The Plotly function illustrated here provides us with the best of both worlds: a highlight of the data of interest against a background of the complete dataset.
Thanks for reading, I hope you found it useful. All of the code and the data can be downloaded from my GitHub repository. If you would like to see more of my work, please follow me on Medium, and/or Substack.
If you are interested in any of the books that I have mentioned in the article, please take a look at my bookstore on Bookstore.org (UK, US) — supporting local bookstores, not billionaires.
Currently, although paid subscriptions are open, you won’t get anything extra - this may change in the future. But if you feel inclined to support my work regularly, you could do this. If you don’t want to, that’s fine, just take out a free subscription. Or, if you don’t want to get all that stuff in your inbox, just follow me.
Or if you don’t want to subscribe but enjoyed this article, you might want to…
Or even…
Notes
The Functional Art: An introduction to information graphics and visualization, Alberto Cairo, 2013 (UK, US)
The Rational Optimist: How Prosperity Evolves, Matt Ridley, 2011 (UK, US)
The original UNData can be found here:
https://data.un.org/
. Data from UNData is freely usable as long as the source is credited.
The Our World in Data resource that I used can be found here: https://ourworldindata.org/population-growth.
UN, World Population Prospects (2024) — processed by Our World in Data.
Our World in Data may be freely used under the Creative Commons BY license.