Monthly Archives: November 2012

Most-cited books on list of lists of data visualization readings

As part of the resources for his online data visualization course, Alberto Cairo has posted several lists of recommended readings:

Some of these links lead to other excellent recommended-readings lists:

I figured I should focus on reading the book suggestions that came up more than once across these lists. Below is the ranking; it’s by author rather than book, since some authors were suggested with multiple books. So many good books!

The list, by number of citations per author: Continue reading

Graph Design for the Eye and Mind, Stephen Kosslyn

When I reviewed The Grammar of Graphics, Harlan Harris pointed me to Kosslyn’s book Graph Design for the Eye and Mind. I’ve since read it and can recommend it highly, although the two books have quite different goals. Unlike Wilkinson’s book, which provides a framework encompassing all the graphics that are possible, Kosslyn’s book summarizes perceptual research on what makes graphics actually readable.

In other words, this is something of the graphics equivalent to Strunk and White’s The Elements of Style, except that Kosslyn’s grounded in actual psychology research rather than personal preferences. This is a good book to keep at your desk for quickly checking whether your most recent graphic follows his advice.

Kosslyn is targeting the communicator-of-results, not the pure statistician (churning out graphs for experts’ data exploration) or the data artist (playing with data-inspired, more-pretty-than-meaningful visual effects). In contrast to Tukey’s remark that a good statistical graphic “forces us to notice what we never expected to see,” Kosslyn’s focus is clear communication of what the analyst has already notices.

For present purposes I would say that a good graph forces the reader to see the information the designer wanted to convey. This is the difference between graphics for data analysis and graphics for communication.

Kosslyn also respects aesthetics but does not focus on them:

Making a display attractive is the task of the designer [...] But these properties should not obscure the message of the graph, and that’s where this book comes in.

So Kosslyn presents his 8 “psychological principles of effective graphics” (for details, see Chopeta Lyons’ review or pages 4-12 of Kosslyn’s Clear and to the Point). Then he illustrates the principles with clear examples and back them up with research citations, for each of several common graph types as well as for labels, axes, etc. in general. I particularly like all the paired “Don’t” and “Do” examples, showing both what to avoid and how to fix it. Most of the book is fairly easy reading and solid advice. Although much of it is common sense, it’s useful as a quick checkup of the graphs you’re creating, especially as it’s so well laid-out.

Bonus: Unlike many other recent data visualization books, Kosslyn does not completely disavow pie charts. Rather, he gives solid advice on the situations where they are appropriate, and on how to use them well in those cases.

If you want to dig even deeper, Colin Ware’s Information Visualization is a very detailed but readable reference on the psychological and neural research that underpins Kosslyn’s advice.

The rest of this post is a list of notes-to-self about details I want to remember or references to keep handy… Bolded notes are things I plan to read about further. Continue reading

Statistics is Applied Science Fiction

I’m enjoying the discussion coming out of Alberto Cairo‘s online data visualization course.

Bryn Williams, in a comment on thinkers & creators who read comics & sci-fi for inspiration:

“…a familiarity with imagined alternative worlds makes philosophy an easier path to tread when posing counterfactuals and thought experiments…”

My response:

And not just philosophy or data visualization — I think statistics could be presented as a kind of “applied science fiction.” When you perform a hypothesis test of whether some parameter is 0, you

  1. assume it *is* 0,
  2. imagine what kinds of data you would probably have seen under that assumption, and then
  3. if the real data you *did* see is unlikely under that assumption, decide that the assumption is probably wrong.

It’s just like in SF where

  1. you imagine a possible alternate reality (say, Joe discovers a talent for dowsing),
  2. you explore the consequences if that possibility were true (Joe becomes rich from oil prospecting), and
  3. in the best cases, readers can draw lessons about our actual reality from this thought experiment (http://xkcd.com/808/).

(XKCD is, of course, a great comic for both SF and datavis. See also this recent SMBC for another amusing exploration of “If this claim were true…”)

Loess and Clark

Apologies for the awful pun in the title, but it seemed to befit an exploration of the history of loess local regression, particularly its name and codebase.

If you’re not familiar with loess, it’s basically a nonparametric algorithm that smooths the data to find the local mean of y at each x value. If you want to end up with a more traditional regression, loess can still be a useful starting point for visually finding trends in the data. Earl Glynn shows a worked example with R code that illustrates the loess fit for different values of the bandwidth.

Today was the first session of a Machine Learning study group with my colleagues. (We’re following along Andrew Ng‘s course notes for Stanford’s CS 229, also available on Coursera.) In the first chapter, Ng mentions loess regression, and two colleagues had interesting historical comments about it. Continue reading