Category Archives: Visualization

When static graphs beat interactives

William Cleveland gave a great interview in a recent Policyviz podcast. (Cleveland is a statistician and a major figure in data visualization research; I’ve reviewed his classic book The Elements of Graphic Data before.) He discussed the history of the term “data science,” his visual perception research, statistical computing advances, etc.

But Cleveland also described his work on brushing and on trellis graphics.

• Brushing is an interactive technique for highlighting data points across linked plots. Plot Y vs X1 and Y vs X2; select some points on the first plot; and they are automatically highlighted on the second plot. You can condition on-the-fly on X1 to better understand the multivariate structure between X1, X2, and Y.
• Trellis displays are essentially Cleveland’s version of small multiples, or of faceting in the Grammar of Graphics sense. Again, you condition on one variable and see how it affects the plots of other variables. See for example slides 10 and 15 here.

I found it fascinating that the static trellis technique evolved from interactive brushing, not vice versa!

Cleveland and colleagues noticed that although brushing let you find interesting patterns, it was too difficult to remember and compare them. You only saw one “view” of the linked plots at a time. Trellises would instead allow you to see many slices at once, making simultaneous comparisons easier.

For example, here’s a brushing view of data on housing: rent, size, year it was built, and whether or not it’s in a “good neighborhood” (figures from Interactive Graphics for Data Analysis: Principles and Examples). The user has selected a subset of years and chosen “good” neighborhoods, and now these points are highlighted in the scatterplot of size vs rent.

That’s great for finding patterns in one subset at a time, but not ideal for comparing the patterns in different subsets. If you select a different subset of years, you’ll have to memorize the old subset’s scatterplot in order to decide whether it differs much from the new subset’s scatterplot; or switch back and forth between views.

Now look at the trellis display: the rows show whether or not the neighborhood is “good,” the columns show subsets of year, and each scatterplot shows size vs rent within that data subset. All these subsets’ scatterplots are visible at once.

If there were different size-vs-rent patterns across year and neighborhood subsets, we’d be able to spot such an effect easily. I admit I don’t see any such effect—but that’s an interesting finding in its own right, and easier to confirm here than with brushing’s one-view-at-a-time.

So the shinier, fancier, interactive graphic is not uniformly better than a careful redesign of the old static one. Good to remember.

Tapestry 2016 conference: short stories and wrap-up

My last post introduced the recent Tapestry conference and described the three keynote talks.

Below are my notes on the six “Short Stories” presentations and a few miscellaneous points.

Overview

Encouraged by Robert Kosara’s call for applications, I attended the Tapestry 2016 conference two weeks ago. As advertised, it was a great chance to meet others from all over the data visualization world. I was one of relatively few academics there, so it was refreshing to chat with journalists, industry analysts, consultants, and so on. (Journalists were especially plentiful since Tapestry is the day before NICAR, the Computer-Assisted Reporting Conference.) Thanks to the presentations, posters & demos, and informal chats throughout the day, I came away with new ideas for improving my dataviz course and my own visualization projects.

I also presented a poster and handout on the course design for my Fall 2015 dataviz class. It was good to get feedback from other people who’ve taught similar courses, especially on the rubrics and assessment side of things.

The conference is organized and sponsored by the folks at Tableau Software. Although I’m an entrenched R user myself, I do appreciate Tableau’s usefulness in bringing the analytic approach of the grammar of graphics to people who aren’t dedicated programmers. To help my students and collaborators, I’ve been meaning to learn to use Tableau better myself. Folks there told me I should join the Pittsburgh Tableau User Group and read Dan Murray’s Tableau Your Data!.

Below are my notes on the three keynote speakers: Scott Klein on the history of data journalism, Jessica Hullman on research into story patterns, and Nick Sousanis on comics and visual thinking vs. traditional text-based scholarship.
My next post will continue with notes on the “short stories” presentations and some miscellaneous thoughts.

Are you really moving to Canada?

It’s another presidential election year in the USA, and you know what that means: Everyone’s claiming they’ll move to Canada if the wrong candidate wins. But does anyone really follow through?

Anecdotal evidence: Last week, a Canadian told me she knows at least a dozen of her friends back home are former US citizens who moved, allegedly, in the wake of disappointing election results. So perhaps there’s something to this claim/threat/promise?

Statistical evidence: Take a look for yourself.

As a first pass, I don’t see evidence of consistent, large spikes in migration right after elections. The dotted vertical lines denote the years after an election year, i.e. the years where I’d expect spikes if this really happened a lot. For example: there was a US presidential election at the end of 1980, and the victor took office in 1981. So if tons of disappointed Americans moved to Canada afterwards, we’d expect a dramatically higher migration count during 1981 than 1980 or 1982. The 1981 count is a bit higher than its neighbors, but the 1985 is not, and so on. Election-year effects alone don’t seem to drive migration more than other factors.

What about political leanings? Maybe Democrats are likely to move to Canada after a Republican wins, but not vice versa? (In the plot, blue and red shading indicate Democratic and Republican administrations, respectively.) Migration fell during the Republican administrations of the ’80s, but rose during the ’00s. So, again, the victor’s political party doesn’t explain the whole story either.

I’m not an economist, political scientist, or demographer, so I won’t try to interpret this chart any further. All I can say is that the annual counts vary by a factor of 2 (5,000 in the mid-’90s, compared to 10,000 around 1980 or 2010)… So the factors behind this long-term effect seems to be much more important than any possible short-term election-year effects.

Extensions: Someone better informed than myself could compare this trend to politically-motivated migration between other countries. For example, my Canadian informant told me about the Quebec independence referendum, which lost 49.5% to 50.5%, and how many disappointed Québécois apparently moved to France afterwards.

Data notes: I plotted data on permanent immigrants (temporary migration might be another story?) from the UN’s Population Division, “International Migration Flows to and from Selected Countries: The 2015 Revision.” Of course it’s a nontrivial question to define who counts as an immigrant. The documentation for Canada says:

International migration data are derived from administrative sources recording foreigners who were granted permission to reside permanently in Canada. … The number of immigrants is subject to administrative corrections made by Citizenship and Immigration Canada.

Tapestry 2016 materials: LOs and Rubrics for teaching Statistical Graphics and Visualization

Here are the poster and handout I’ll be presenting tomorrow at the 2016 Tapestry Conference.

My poster covers the Learning Objectives that I used to design my dataviz course last fall, along with the grading approach and rubric categories that I used for assessment. The Learning Objectives were a bit unusual for a Statistics department course, emphasizing some topics we teach too rarely (like graphic design). The “specs grading” approach1 seemed to be a success, both for student motivation and for the quality of their final projects.

The handout is a two-sided single page summary of my detailed rubrics for each assignment. By keeping the rubrics broad (and software-agnostic), it should be straightforward to (1) reuse the same basic assignments in future years with different prompts and (2) port these rubrics to dataviz courses in other departments.

I had no luck finding rubrics for these learning objectives when I was designing the course, so I had to write them myself.2 I’m sharing them here in the hopes that other instructors will be able to reuse them—and improve on them!

Any feedback is highly appreciated.

Footnotes:

PolicyViz episode on teaching data visualization

When I was still in DC, I knew Jon Schwabish’s work designing information and data graphics for the Congressional Budget Office. Now I’ve run across his podcast and blog, PolicyViz. There’s a lot of good material there.

I particularly liked a recent podcast episode that was a panel discussion about teaching dataviz. Schwabish and four other experienced instructors talked about course design, assignments and assessment, how to teach implementation tools, etc.

I recommend listening to the whole thing. Below are just notes-to-self on the episode, for my own future reference.

The Elements of Graphing Data, William S. Cleveland

Bill Cleveland is one of the founding figures in statistical graphics and data visualization. His two books, The Elements of Graphing Data and Visualizing Data, are classics in the field, still well-worth reading today.

Visualizing is about the use of graphics as a data analysis tool: how to check model fit by plotting residuals and so on. Elements, on the other hand, is about the graphics themselves and how we read them. Cleveland (co)-authored some of the seminal papers on human visual perception, including the often-cited Cleveland & McGill (1984), “Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods.” Plenty of authors doled out common-sense advice about graphics before then, and some even ran controlled experiments (say, comparing bars to pies). But Cleveland and colleagues were so influential because they set up a broader framework that is still experimentally-testable, but that encompasses the older experiments (say, encoding data by position vs length vs angle vs other things—so that bars and pies are special cases). This is just one approach to evaluating graphics, and it has limitations, but it’s better than many competing criteria, and much better than “because I said so” *coughtuftecough* 🙂

In Elements, Cleveland summarizes his experimental research articles and expands on them, adding many helpful examples and summarizing the underlying principles. What cognitive tasks do graph readers perform? How do they relate to what we know about the strengths and weaknesses of the human visual system, from eye to brain? How do we apply this research-based knowledge, so that we encode data in the most effective way? How can we use guides (labels, axes, scales, etc.) to support graph comprehension instead of getting in the way? It’s a lovely mix of theory, experimental evidence, and practical advice including concrete examples.

Now, I’ll admit that (at least in the 1st edition of Elements) the graphics certainly aren’t beautiful: blocky all-caps fonts, black-and-white (not even grayscale), etc. Some data examples seem dated now (Cold War / nuclear winter predictions). The principles aren’t all coherent. Each new graph variant is given a name, leading to a “plot zoo” that the Grammar of Graphics folks would hate. Many examples, written for an audience of practicing scientists, may be too technical for lay readers (for whom I strongly recommend Naomi Robbins’ Creating More Effective Graphs, a friendlier re-packaging of Cleveland).

Nonetheless, I still found Elements a worthwhile read, and it made a big impact on the data visualization course I taught. Although the book is 30 years old, I still found many new-to-me insights, along with historical context for many aspects of R’s base graphics.

[Edit: I’ll post my notes on Visualizing Data separately.]

Below are my notes-to-self, with things-to-follow-up in bold:

Teaching data visualization: approaches and syllabi

While I’m still working on my reflection of the dataviz course I just taught, there were some useful dataviz-teaching talks at the recent IEEE VIS conference.

Jen Christiansen and Robert Kosara have great summaries of the panel on “Vis, The Next Generation: Teaching Across the Researcher-Practitioner Gap.”

Even better, slides are available for some of the talks: Marti Hearst, Tamara Munzner, and Eytan Adar. Lots of inspiration for the next time I teach.

Finally, here are links to the syllabi or websites of various past dataviz courses. Browsing these helps me think about what to cover and how to teach it.

Not quite data visualization, but related:

Comment below or tweet @civilstat with any others I’ve missed, and I’ll add them to the list.
(Update: Thanks to John Stasko for links to many I missed, including his own excellent course site & resource page.)

Statistical Graphics and Visualization course materials

I’ve just finished teaching the Fall 2015 session of 36-721, Statistical Graphics and Visualization. Again, it is a half-semester course designed primarily for students in the MSP program (Masters of Statistical Practice) in the CMU statistics department. I’m pleased that we also had a large number of students from other departments taking this as an elective.

For software we used mostly R (base graphics, ggplot2, and Shiny). But we also spent some time on Tableau, Inkscape, D3, and GGobi.

We covered a LOT of ground. At each point I tried to hammer home the importance of legible, comprehensible graphics that respect human visual perception.

Remaking pie charts is a rite of passage for statistical graphics students

My course materials are below. Not all the slides are designed to stand alone, but I have no time to remake them right now. I’ll post some reflections separately.

Download all materials as a ZIP file (38 MB), or browse individual files:

Summary sheet of ways to map statistical uncertainty

A few years ago, a team at the Cornell Program on Applied Demographics (PAD) created a really nice demo of several ways to show statistical uncertainty on thematic maps / choropleths. They have kindly allowed me to host their large file here: PAD_MappingExample.pdf (63 MB)

Screenshot of index page from PAD mapping examples

Each of these maps shows a dataset with statistical estimates and their precision/uncertainty for various areas in New York state. If we use color or shading to show the estimates, like in a traditional choropleth map, how can we also show the uncertainty at the same time? The PAD examples include several variations of static maps, interaction by toggling overlays, and interaction with mouseover and sliders. Interactive map screenshots are linked to live demos on the PAD website.

I’m still fascinated by this problem. Each of these approaches has its strengths and weaknesses: Symbology Overlay uses separable dimensions, but there’s no natural order to the symbols. Pixelated Classification seems intuitively clear, but may be misleading if people (incorrectly) try to find meaning in the locations of pixels within an area. Side-by-side maps are each clear on their own, but it’s hard to see both variables at once. Dynamic Feedback gives detailed info about precision, but only for one area at a time, not all at once. And so forth. It’s an interesting challenge, and I find it really helpful to see so many potential solutions collected in one document.

The creators include Nij Tontisirin and Sutee Anantsuksomsri (both since moved on from Cornell), and Jan Vink and Joe Francis (both still there). The pixellated classification map is based on work by Nicholas Nagle.

For more about mapping uncertainty, see their paper:

Francis, J., Tontisirin, N., Anantsuksomsri, S., Vink, J., & Zhong, V. (2015). Alternative strategies for mapping ACS estimates and error of estimation. In Hoque, N. and Potter, L. B. (Eds.), Emerging Techniques in Applied Demography (pp. 247–273). Dordrecht: Springer Netherlands, DOI: 10.1007/978-94-017-8990-5_16 [preprint]

and my related posts:

See also Visualizing Attribute Uncertainty in the ACS: An Empirical Study of Decision-Making with Urban Planners. This talk by Amy Griffin is about studying how urban planners actually use statistical uncertainty on maps in their work.