Home » Data visualization » ‘With the great power of graphs comes great responsibility’

‘With the great power of graphs comes great responsibility’

Modern neuroscience overwhelmingly relies on empirical evidence: the collection of data through observation or experimentation. Once the data have been collected, they must be analyzed, summarized, and shared with others. Data visualization is a crucial part of these steps, particularly when researchers communicate their results in scientific publications.

Despite the central role that data visualization plays in modern neuroscience, it is hardly taught at the undergraduate or graduate level. From my own short experience, you mostly pick it up as you go, starting in grad school. While there is nothing inherently wrong with such a hands-on, problem-based approach to learning, accurate and efficient graphical representation of data does follow a few basic but crucial principles. Simply put:

  • make sure that you indicate what variable you are plotting (e.g. are you plotting the mean or the median? did you include a scale?)
  • display the uncertainty about the plotted variable (again, indicate what you are plotting, e.g. standard deviations or percentiles)

Easy, right? Well… When Elana Allen and colleagues, from the Mind Research Network in Albuquerque, NM, surveyed 288 articles published in 2010 in six leading neuroscience journals, they found that a significant part of the figures did not include all of these basic features. That was especially true of figures that attempted to represent more than 2 dimensions by using color or shades of gray (“3D figures” in Allen’s words): most of those omitted reporting the uncertainty of the reported effects. It is admittedly difficult to analyze multi-dimensional data sets and represent them onto a flat two-dimensional image (hence the title of Allen and colleagues’ article: “Data Visualization in the Neurosciences: Overcoming the Curse of Dimensionality“). However, more disturbingly, only 43% of the 3D figures even labelled the dependent variable to begin with. All is not well with the simpler 2D figures, either: 30% of the figures that included some measure of uncertainty (error bars) failed to indicate which one.

‘Show more, hide less’

How can scientists produce better graphical representations of their data? The authors provide a series of recommendations in the form of a great checklist. But where the paper really shines, in my opinion, is in the brilliant case studies that it provides, which underline to what extent the structure of a dataset can be hidden when depicted with an inappropriate graphical display, and how to avoid this. The article is accessible for free here, so you can check out the figures yourself. Because the article is made available under an Elsevier user license, I’m reproducing some of the figures here. The copyright remains with Elsevier. You can find more information about the Elsevier user license here.

The first example (see their Figure 2 below) covers the venerable 2D bar plot, which the authors improve upon first with box plots and then by plotting an almost complete graphical representation of the dataset using a violin plot that is way more informative than the original. Personally, I’m also enthusiast about bee-swarm plots, which plot every single data point. I know of an R package for bee-swarm plots (check out the great graphical examples!); unfortunately, I don’t know of any equivalent for MATLAB that looks as good.

Figure 2 from Allen et al., Neuron 2012. Made available under an Elsevier user license. Copyright Elsevier.

Figure 2 from Allen et al., Neuron 2012. Made available under an Elsevier user license. Copyright Elsevier.

Moving to more complex data sets, Allen and colleagues turn to EEG and event-related potentials (ERP; see their Figure 3A below). There, they suggest displaying the uncertainty around the ERP waveforms using shaded areas (don’t forget to label what measure of uncertainty you’re plotting!). This is easy to implement in MATLAB, for instance using the boundedline function, by Kelly Kearney. The authors also encourage plotting a graphical representation of the results of statistical testing on the same plot. This makes total sense, adds minimal work to preparing the figure, and should definitely be standard practice.

The last example looks truly spectacular. Allen and colleagues use both color hues and transparency to illustrate areas of the brain that undergo significant changes in activity in a task-based functional MRI dataset (see their Figure 3B below). The conventional way of representing fMRI results would be to apply a threshold to the indices of brain activity, and only show those brain regions that were beyond the threshold. However, thresholds are arbitrary, and most of the brain’s activity gets “erased” from the plot. The authors’ approach allows them to show the data’s structure in a much more thorough fashion (in this case including areas of the brain that undergo de-activation during the task, likely corresponding to the “default mode network”), without cluttering the display or making it too complicated. They also provide an example dataset and MATLAB scripts to reproduce their figure.

Figure 3 from Allen et al., Neuron 2012. Made available under an Elsevier user license. Copyright Elsevier.

Figure 3 from Allen et al., Neuron 2012. Made available under an Elsevier user license. Copyright Elsevier.

‘The jet colormap must die!’

I found one minor weakness in Allen et al.’s paper: their recommendation of color scales (or colormaps). Specifically, for bipolar data, which can range from negative through zero to positive values, they suggest using a rainbow (or “jet”) colormap: negative values are mapped to progressively lighter shades of blue, moving to green for data whose value is zero, then through yellows and oranges to reds for positive data. There are several problems about this particular colormap, detailed in multiple papers and blog posts (I took this section’s title from one such post). To summarize them briefly:

  • human vision does not perceive the color changes of the jet colormap as homogeneous, creating artificial “borders” (called Mach bands) when continuous surfaces are plotted (see the illustration below)
  • the order of the colors is arbitrary (despite being that of the rainbow)
  • the luminance of successive colors does not follow a monotonous increase or decrease
  • the presence of greens and reds make it hard to interpret by people with the most common disturbance of color vision

The bright blues and yellows of the jet colormap cause stripes to appear in the mexican hat at the top left. This is much less apparent if the jet colormap is not used to paint over continuous surfaces, such as on the mesh at the top right. A cold-to-warm colormap does not create false stripes (bottom).

For plotting bipolar data, other colormaps can avoid the “illusory border” problem, for instance a colormap that goes from cold (blue) to warm (red) colors. In the example above, I’ve taken the cold-to-warm colormap from an excellent paper on colormaps by Kenneth Moreland, of the Sandia National Laboratories, USA.

Apart from this minor objection, however, I warmly recommend reading Allen et al.’s thoughtful discussion–you will likely produce better data visualizations thanks to the authors!

(And if you still need to plot bar plots, I’ve got a great MATLAB function for you!)

References

Allen, E., Erhardt, E., & Calhoun, V. (2012). Data Visualization in the Neurosciences: Overcoming the Curse of Dimensionality Neuron, 74 (4), 603-608 DOI: 10.1016/j.neuron.2012.05.001

Moreland K. Diverging Color Maps for Scientific Visualization (Expanded). Proceedings of the 5th International Symposium on Visual Computing. 2009 December.

Advertisements

2 Comments

  1. Markus says:

    Thanks Pierre!
    – Astonishing article… one would guess that this is more the introduction to basic statistics… but selling it to Neuron, respect!
    – So, roughly 30% of 2D graphics in NI and HBM don’t provide information about uncertainty…
    – I suppose that the use of bargraphs is sometimes highly welcome as it hides imperfection of the data… and helps to sell the message, though.
    – I once provided many boxplots for nonparametric data, but finally the reviewer wanted to see bar graphs with error bars… so the eye likes what is known.

    Like

  2. Thanks for the comment Markus. I agree that the article is, to some extent, stating the obvious, but to me, the authors’ finding that so many figures lack so much basic information in some of the most recognized neuroscience journals explains why their article is needed.

    Regarding bar plots, the authors of the article (and I) concur with you: they are familiar to readers, making them easy to understand, and you can pack a lot of information in a bar plot.

    I have recently acquired books by Edward Tufte, a statistician at Yale University who worked on data visualization a lot (http://www.edwardtufte.com/tufte/index), and I’m looking forward to going over them!

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: