Category Archives: Books

Visual Revelations, Howard Wainer

I’m starting to recognize several clusters of data visualization books. These include:

(Of course this list calls out for a flowchart or something to visualize it!)

Howard Wainer’s Visual Revelations falls in this last category. And it’s no surprise Wainer’s book emulates Tufte’s, given how often the author refers back to Tufte’s work (including comments like “As Edward Tufte told me once…”). And The Visual Display of Quantitative Information is still probably the best introduction to the genre. But Visual Revelations is different enough to be a worthwhile read too if you enjoy such books, as I do.

Most of all, I appreciated that Wainer presents many bad graph examples found “in the wild” and follows them with improvements of his own. Not all are successful, but even so I find this approach very helpful for learning to critique and improve my own graphics. (Tufte’s classic book critiques plenty, but spends less time on before-and-after redesigns. On the other hand, Kosslyn’s book is full of redesigns, but his “before” graphs are largely made up by him to illustrate a specific point, rather than real graphics created by someone else.)

Of course, Wainer covers the classics like John Snow’s cholera map and Minard’s plot of Napoleon’s march on Russia (well-trodden by now, but perhaps less so in 1997?). But I was pleased to find some fascinating new-to-me graphics. In particular, the Mann Gulch Fire section (p. 65-68) gave me shivers: it’s not a flashy graphic, but it tells a terrifying story and tells it well.
[Edit: I should point out that Snow's and Minard's plots are so well-known today largely thanks to Wainer's own efforts. I also meant to mention that Wainer is the man who helped bring into print an English translation of Jacques Bertin's seminal Semiology of Graphics and a replica volume of William Playfair's Commercial and Political Atlas and Statistical Breviary. He has done amazing work at unearthing and popularizing many lost gems of historical data visualization!
See also Alberto Cairo's review of a more recent Wainer book.]

Finally, Wainer’s tone overall is also much lighter and more humorous than Tufte’s. His first section gives detailed advice on how to make a bad graph, for example. I enjoyed Wainer’s jokes, though some might prefer more gravitas.

Continue reading

Statistical Inference, Michael Oakes; and “Likelihood inference”

You may be familiar with the long-running divide between Classical or Frequentist (a.k.a. Neyman-Pearson) and Bayesian statisticians. (If not, here’s a simplistic overview.) The schism is being smoothed over, and many statisticians I know are pragmatists who feel free to use either approach depending on the problem at hand.

However, when I read Gerard van Belle’s Statistical Rules of Thumb, I was surprised by his brief mention of three distinct schools of inference: Neyman-Pearson, Bayesian, and Likelihood. I hadn’t heard of the third, so I followed van Belle’s reference to Michael Oakes’ book Statistical Inference: A Commentary for the Social and Behavioural Sciences.

Why should you care what school of inference you use? Well, it’s a framework that guides how you think about science: this includes the methods you choose to use and, crucially, how you interpret your results. Many Frequentist methods have a Bayesian analogue that will give the same numerical result on any given dataset, but the implications you can draw are quite different. Frequentism is the version taught traditionally in Stat101, but if you show someone the results of your data analysis, most people’s interpretation will be closer to the Bayesian interpretation than the Frequentist. So I was curious how “Likelihood inference” compares to these other two.

Below I summarize what I learned from Oakes about Likelihood inference. I close with some good points from the rest of Oakes’ book, which is largely about the misuse of null hypothesis significance testing (NHST) and a suggestion to publish effect size estimates instead.

Continue reading

A Theory of Data, Clyde Coombs

Earlier I’ve quoted Leland Wilkinson in The Grammar of Graphics, where he recommends Clyde Coombs’ book A Theory of Data:

…in a landmark book, now out of print and seldom read by statisticians, Coombs (1964) … believed that the prevalent practice of modeling based on cases-by-variables data layouts often prevents researchers from considering more parsimonious structural theories and keeps them from noticing meaningful patterns in their data.

I checked out Coombs’ book through interlibrary loan and haven’t had time to read it thoroughly before the due date. But even from skimming it on the train a few days, I can see why Wilkinson recommends it.

Continue reading

Most-cited books on list of lists of data visualization readings

As part of the resources for his online data visualization course, Alberto Cairo has posted several lists of recommended readings:

Some of these links lead to other excellent recommended-readings lists:

I figured I should focus on reading the book suggestions that came up more than once across these lists. Below is the ranking; it’s by author rather than book, since some authors were suggested with multiple books. So many good books!

The list, by number of citations per author: Continue reading

Graph Design for the Eye and Mind, Stephen Kosslyn

When I reviewed The Grammar of Graphics, Harlan Harris pointed me to Kosslyn’s book Graph Design for the Eye and Mind. I’ve since read it and can recommend it highly, although the two books have quite different goals. Unlike Wilkinson’s book, which provides a framework encompassing all the graphics that are possible, Kosslyn’s book summarizes perceptual research on what makes graphics actually readable.

In other words, this is something of the graphics equivalent to Strunk and White’s The Elements of Style, except that Kosslyn’s grounded in actual psychology research rather than personal preferences. This is a good book to keep at your desk for quickly checking whether your most recent graphic follows his advice.

Kosslyn is targeting the communicator-of-results, not the pure statistician (churning out graphs for experts’ data exploration) or the data artist (playing with data-inspired, more-pretty-than-meaningful visual effects). In contrast to Tukey’s remark that a good statistical graphic “forces us to notice what we never expected to see,” Kosslyn’s focus is clear communication of what the analyst has already notices.

For present purposes I would say that a good graph forces the reader to see the information the designer wanted to convey. This is the difference between graphics for data analysis and graphics for communication.

Kosslyn also respects aesthetics but does not focus on them:

Making a display attractive is the task of the designer [...] But these properties should not obscure the message of the graph, and that’s where this book comes in.

So Kosslyn presents his 8 “psychological principles of effective graphics” (for details, see Chopeta Lyons’ review or pages 4-12 of Kosslyn’s Clear and to the Point). Then he illustrates the principles with clear examples and back them up with research citations, for each of several common graph types as well as for labels, axes, etc. in general. I particularly like all the paired “Don’t” and “Do” examples, showing both what to avoid and how to fix it. Most of the book is fairly easy reading and solid advice. Although much of it is common sense, it’s useful as a quick checkup of the graphs you’re creating, especially as it’s so well laid-out.

Bonus: Unlike many other recent data visualization books, Kosslyn does not completely disavow pie charts. Rather, he gives solid advice on the situations where they are appropriate, and on how to use them well in those cases.

If you want to dig even deeper, Colin Ware’s Information Visualization is a very detailed but readable reference on the psychological and neural research that underpins Kosslyn’s advice.

The rest of this post is a list of notes-to-self about details I want to remember or references to keep handy… Bolded notes are things I plan to read about further. Continue reading

The Grammar of Graphics: notes on first reading

Leland Wilkinson’s The Grammar of Graphics is a classic in the data visualization literature. Wilkinson created a framework that coherently ties together many aspects of designing, implementing, reading, and understanding a graphic. It’s a useful approach and has been fairly influential: The popular R package ggplot2 is, more or less, an implementation of Wilkinson’s ideas, and I also see their influence in the software Tableau (about which more another time). Wilkinson himself helped to build these ideas into SPSS’s Graphics Production Language (GPL) and then SPSS Visualization Designer.

So what’s so special here? One of the core ideas is to start with the raw data and think about all the transformations, summaries, etc. that go into graphing it. With a good framework, this can help us see connections between different graphs and create new ones. (The opposite extreme is a “typology” or list of graph types, like you get in Excel: do you want a bar chart, a pie chart, a line chart, or one of these other 10 types? Such a list has no deep structure.) Following Wilkinson’s approach, you’ll realize that a pie chart is basically just a stacked bar chart plotted in polar coordinates, with bar height mapped to pie-slice angle… and that can get you thinking: What if I mapped bar height to radius, not angle? What if I added a variable and moved to spherical coordinates? What if I put a scatterplot in polar coordinates too? These may turn out to be bad ideas, but at least you’re thinking — in a way that is not encouraged by Excel’s list of 10 graph types.

This is NOT the approach that Wilkinson takes.

But, of course, thinking is hard, and so is this book. Reading The Grammar of Graphics requires much more of a dedicated slog than, say, Edward Tufte’s books, which you can just flip through randomly for inspiration and bite-sized nuggets of wisdom. (I admire Tufte too, but I have to admit that Wilkinson’s occasional jabs at Tufte were spot-on and amused me to no end.) It’s a book full of wit and great ideas, but also full of drawn-out sections that require serious focus, and it takes a while to digest it all and put it together in your mind.

So, although I’d highly recommend this book to anyone deeply interested in visualization, I’m still digesting it. What follows is not a review but just notes-to-self from my first read-through: things to follow up on and for my own reference. It might not be particularly thrilling for other readers. Continue reading

Pithy and pragmatic textbooks

I enjoy the rare statistics textbook that can take its subject with a grain of salt:

The practitioner has heard that the [random field] should be ergodic, since “this is what makes statistical inference possible,” but is not sure how to check this fact and proceeds anyway, feeling vaguely guilty of having perhaps overlooked something very important.
Geostatistics: Modeling Spatial Uncertainty, by Chilès and Delfiner.

It’s a familiar feeling!
As Chilès and Delfiner wryly suggest, we statisticians could often do a better job of writing for beginners or practitioners. We should not just state the assumptions needed by our tools, but also explain how sensitive results are to the assumptions, how to check these assumptions in practice, and what else to try if they’re not met.