Beware of Algorithmically Generated Scales in Heat Maps
Automatically adjusted scales can be misleading. We examine issues and how we can resolve them.
Back in 2020, in the dark days of the Covid-19 pandemic, a visualization designer was mercilessly attacked on Twitter (as it was then) for allegedly downplaying the outbreak’s severity.
The designer faced accusations of manipulating the colour scales on choropleth maps that tracked infection rates. Critics argued that comparing the latest map to one from a week prior should have shown a stark increase in cases - but they didn’t.
The observation was true, but it wasn’t a deliberate cover-up. A closer look at the scales and the data would have cleared things up.
The map colours represented numerical ranges, but these changed over time. The truth was that these scales were not manipulated but were algorithmically adjusted to fit the latest data, which was skyrocketing week by week. So, while the colour range stayed the same, the numerical range ballooned.
An absolute giveaway was the range boundaries. They weren’t whole numbers, they had decimal places — values no human would pick. Probably, the chart designer didn’t give it much thought — why not trust the app’s colour assignments?
The pandemic’s unique challenge was the rapid, massive increase in numbers. Predicting the final infection count from the start (impossible, of course) would have allowed for a fixed scale on all charts, but that would have its own issues: early charts would only use a very limited part of the scale, making small numbers look nearly identical and so not showing much detail.
Let me illustrate the problem.
Take a look at the heat maps below. The one labelled “Week 1” is constructed with random numbers and, as you move right, those numbers are doubled.
The numbers are increasing, but the charts look pretty similar, right?
Now, imagine these heat maps represent weekly infection rates in areas of the fictional (and weirdly geometrical) town of Grid City. The map represents the city itself, and each square inside it represents a district in the city.
A brief look at the maps would seem to suggest that nothing much is happening. But the reality is that the infection rate is growing exponentially.
Looking at the scales will confirm that growth: as you move across the maps from left to right, each colour represents an increasingly wide range of values.
It is not unreasonable to use automatically generated scales like this as long as you are only looking at one map. However, you cannot properly compare one with another because, if the data range is expanding, a colour in the earlier maps represents a smaller range than the same colour in the previous one.
How to fix it
So, you cannot reasonably compare different heatmaps (including choropleths) created at different times if the range of the data is changing rapidly.
But people do (and when they do, they give you hell about it on Twitter). So, what can we do about it?
If you have the luxury of knowing what the final data range will be, you could fix it to be the same in all charts. The heatmaps below use the same data as before, but the data range is fixed to 0 to 800 (a little more than the maximum for week 4). These charts are now comparable, and you can see the lighter colours in the later maps that demonstrate the increases in infections.
There is still a problem, though. Nobody is going to thank you for producing the chart in week 1 — you can’t really see the difference in infection rates across the city that becomes evident in later charts.
So maybe a better approach would be to use the maps with the automatically generated scales but add an extra graphic to show how the data has changed over time. Pairing this with the current heatmap puts the maps into context.
Using this approach nobody can accuse you of hiding the facts, you can see the relative infection rates across the city in the heatmap, and it is very clear how the rates have increased over time for the bar chart.
Conclusion
If you are bold enough to publicise your work on social media you probably have to develop a thick skin. Because, there will always be critics, some reasonable, others not so reasonable, and a few, completely rabid.
I’ve given a possible solution to this particular problem (and if you have any of your own, please share them) but the main message is that you should be careful what you publish. If there is any possibility of criticism in your work then someone will pick you up on it.
All images and screenshots are by me, the author unless otherwise noted.