Category Archives: Visualization

Tapestry 2016 conference: short stories and wrap-up

My last post introduced the recent Tapestry conference and described the three keynote talks.

Below are my notes on the six “Short Stories” presentations and a few miscellaneous points.

Continue reading

Tapestry 2016 conference: overview and keynote speakers

Overview

Encouraged by Robert Kosara’s call for applications, I attended the Tapestry 2016 conference two weeks ago. As advertised, it was a great chance to meet others from all over the data visualization world. I was one of relatively few academics there, so it was refreshing to chat with journalists, industry analysts, consultants, and so on. (Journalists were especially plentiful since Tapestry is the day before NICAR, the Computer-Assisted Reporting Conference.) Thanks to the presentations, posters & demos, and informal chats throughout the day, I came away with new ideas for improving my dataviz course and my own visualization projects.

I also presented a poster and handout on the course design for my Fall 2015 dataviz class. It was good to get feedback from other people who’ve taught similar courses, especially on the rubrics and assessment side of things.

The conference is organized and sponsored by the folks at Tableau Software. Although I’m an entrenched R user myself, I do appreciate Tableau’s usefulness in bringing the analytic approach of the grammar of graphics to people who aren’t dedicated programmers. To help my students and collaborators, I’ve been meaning to learn to use Tableau better myself. Folks there told me I should join the Pittsburgh Tableau User Group and read Dan Murray’s Tableau Your Data!.

Below are my notes on the three keynote speakers: Scott Klein on the history of data journalism, Jessica Hullman on research into story patterns, and Nick Sousanis on comics and visual thinking vs. traditional text-based scholarship.
My next post will continue with notes on the “short stories” presentations and some miscellaneous thoughts.

Continue reading

Are you really moving to Canada?

It’s another presidential election year in the USA, and you know what that means: Everyone’s claiming they’ll move to Canada if the wrong candidate wins. But does anyone really follow through?

Anecdotal evidence: Last week, a Canadian told me she knows at least a dozen of her friends back home are former US citizens who moved, allegedly, in the wake of disappointing election results. So perhaps there’s something to this claim/threat/promise?

Statistical evidence: Take a look for yourself.

MovingToCanada

As a first pass, I don’t see evidence of consistent, large spikes in migration right after elections. The dotted vertical lines denote the years after an election year, i.e. the years where I’d expect spikes if this really happened a lot. For example: there was a US presidential election at the end of 1980, and the victor took office in 1981. So if tons of disappointed Americans moved to Canada afterwards, we’d expect a dramatically higher migration count during 1981 than 1980 or 1982. The 1981 count is a bit higher than its neighbors, but the 1985 is not, and so on. Election-year effects alone don’t seem to drive migration more than other factors.

What about political leanings? Maybe Democrats are likely to move to Canada after a Republican wins, but not vice versa? (In the plot, blue and red shading indicate Democratic and Republican administrations, respectively.) Migration fell during the Republican administrations of the ’80s, but rose during the ’00s. So, again, the victor’s political party doesn’t explain the whole story either.

I’m not an economist, political scientist, or demographer, so I won’t try to interpret this chart any further. All I can say is that the annual counts vary by a factor of 2 (5,000 in the mid-’90s, compared to 10,000 around 1980 or 2010)… So the factors behind this long-term effect seems to be much more important than any possible short-term election-year effects.

Extensions: Someone better informed than myself could compare this trend to politically-motivated migration between other countries. For example, my Canadian informant told me about the Quebec independence referendum, which lost 49.5% to 50.5%, and how many disappointed Québécois apparently moved to France afterwards.

Data notes: I plotted data on permanent immigrants (temporary migration might be another story?) from the UN’s Population Division, “International Migration Flows to and from Selected Countries: The 2015 Revision.” Of course it’s a nontrivial question to define who counts as an immigrant. The documentation for Canada says:

International migration data are derived from administrative sources recording foreigners who were granted permission to reside permanently in Canada. … The number of immigrants is subject to administrative corrections made by Citizenship and Immigration Canada.

Tapestry 2016 materials: LOs and Rubrics for teaching Statistical Graphics and Visualization

Here are the poster and handout I’ll be presenting tomorrow at the 2016 Tapestry Conference.

Poster "Statistical Graphics and Visualization: Course Learning Objectives and Rubrics"

My poster covers the Learning Objectives that I used to design my dataviz course last fall, along with the grading approach and rubric categories that I used for assessment. The Learning Objectives were a bit unusual for a Statistics department course, emphasizing some topics we teach too rarely (like graphic design). The “specs grading” approach1 seemed to be a success, both for student motivation and for the quality of their final projects.

The handout is a two-sided single page summary of my detailed rubrics for each assignment. By keeping the rubrics broad (and software-agnostic), it should be straightforward to (1) reuse the same basic assignments in future years with different prompts and (2) port these rubrics to dataviz courses in other departments.

I had no luck finding rubrics for these learning objectives when I was designing the course, so I had to write them myself.2 I’m sharing them here in the hopes that other instructors will be able to reuse them—and improve on them!

Any feedback is highly appreciated.


Footnotes:

PolicyViz episode on teaching data visualization

When I was still in DC, I knew Jon Schwabish’s work designing information and data graphics for the Congressional Budget Office. Now I’ve run across his podcast and blog, PolicyViz. There’s a lot of good material there.

I particularly liked a recent podcast episode that was a panel discussion about teaching dataviz. Schwabish and four other experienced instructors talked about course design, assignments and assessment, how to teach implementation tools, etc.

I recommend listening to the whole thing. Below are just notes-to-self on the episode, for my own future reference.

Continue reading

The Elements of Graphing Data, William S. Cleveland

Bill Cleveland is one of the founding figures in statistical graphics and data visualization. His two books, The Elements of Graphing Data and Visualizing Data, are classics in the field, still well-worth reading today.

Visualizing is about the use of graphics as a data analysis tool: how to check model fit by plotting residuals and so on. Elements, on the other hand, is about the graphics themselves and how we read them. Cleveland (co)-authored some of the seminal papers on human visual perception, including the often-cited Cleveland & McGill (1984), “Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods.” Plenty of authors doled out common-sense advice about graphics before then, and some even ran controlled experiments (say, comparing bars to pies). But Cleveland and colleagues were so influential because they set up a broader framework that is still experimentally-testable, but that encompasses the older experiments (say, encoding data by position vs length vs angle vs other things—so that bars and pies are special cases). This is just one approach to evaluating graphics, and it has limitations, but it’s better than many competing criteria, and much better than “because I said so” *coughtuftecough* 🙂

In Elements, Cleveland summarizes his experimental research articles and expands on them, adding many helpful examples and summarizing the underlying principles. What cognitive tasks do graph readers perform? How do they relate to what we know about the strengths and weaknesses of the human visual system, from eye to brain? How do we apply this research-based knowledge, so that we encode data in the most effective way? How can we use guides (labels, axes, scales, etc.) to support graph comprehension instead of getting in the way? It’s a lovely mix of theory, experimental evidence, and practical advice including concrete examples.

Now, I’ll admit that (at least in the 1st edition of Elements) the graphics certainly aren’t beautiful: blocky all-caps fonts, black-and-white (not even grayscale), etc. Some data examples seem dated now (Cold War / nuclear winter predictions). The principles aren’t all coherent. Each new graph variant is given a name, leading to a “plot zoo” that the Grammar of Graphics folks would hate. Many examples, written for an audience of practicing scientists, may be too technical for lay readers (for whom I strongly recommend Naomi Robbins’ Creating More Effective Graphs, a friendlier re-packaging of Cleveland).

Nonetheless, I still found Elements a worthwhile read, and it made a big impact on the data visualization course I taught. Although the book is 30 years old, I still found many new-to-me insights, along with historical context for many aspects of R’s base graphics.

[Edit: I’ll post my notes on Visualizing Data separately.]

Below are my notes-to-self, with things-to-follow-up in bold:

Continue reading

Teaching data visualization: approaches and syllabi

While I’m still working on my reflection of the dataviz course I just taught, there were some useful dataviz-teaching talks at the recent IEEE VIS conference.

Jen Christiansen and Robert Kosara have great summaries of the panel on “Vis, The Next Generation: Teaching Across the Researcher-Practitioner Gap.”

Even better, slides are available for some of the talks: Marti Hearst, Tamara Munzner, and Eytan Adar. Lots of inspiration for the next time I teach.

Hearst_ClassDiscussions

Finally, here are links to the syllabi or websites of various past dataviz courses. Browsing these helps me think about what to cover and how to teach it.

Not quite data visualization, but related:

Comment below or tweet @civilstat with any others I’ve missed, and I’ll add them to the list.
(Update: Thanks to John Stasko for links to many I missed, including his own excellent course site & resource page.)

Statistical Graphics and Visualization course materials

I’ve just finished teaching the Fall 2015 session of 36-721, Statistical Graphics and Visualization. Again, it is a half-semester course designed primarily for students in the MSP program (Masters of Statistical Practice) in the CMU statistics department. I’m pleased that we also had a large number of students from other departments taking this as an elective.

For software we used mostly R (base graphics, ggplot2, and Shiny). But we also spent some time on Tableau, Inkscape, D3, and GGobi.

We covered a LOT of ground. At each point I tried to hammer home the importance of legible, comprehensible graphics that respect human visual perception.

Pie chart with remake

Remaking pie charts is a rite of passage for statistical graphics students

My course materials are below. Not all the slides are designed to stand alone, but I have no time to remake them right now. I’ll post some reflections separately.

Download all materials as a ZIP file (38 MB), or browse individual files:
Continue reading

Summary sheet of ways to map statistical uncertainty

A few years ago, a team at the Cornell Program on Applied Demographics (PAD) created a really nice demo of several ways to show statistical uncertainty on thematic maps / choropleths. They have kindly allowed me to host their large file here: PAD_MappingExample.pdf (63 MB)

Screenshot of index page from PAD mapping examples

Screenshot of index page from PAD mapping examples

Each of these maps shows a dataset with statistical estimates and their precision/uncertainty for various areas in New York state. If we use color or shading to show the estimates, like in a traditional choropleth map, how can we also show the uncertainty at the same time? The PAD examples include several variations of static maps, interaction by toggling overlays, and interaction with mouseover and sliders. Interactive map screenshots are linked to live demos on the PAD website.

I’m still fascinated by this problem. Each of these approaches has its strengths and weaknesses: Symbology Overlay uses separable dimensions, but there’s no natural order to the symbols. Pixelated Classification seems intuitively clear, but may be misleading if people (incorrectly) try to find meaning in the locations of pixels within an area. Side-by-side maps are each clear on their own, but it’s hard to see both variables at once. Dynamic Feedback gives detailed info about precision, but only for one area at a time, not all at once. And so forth. It’s an interesting challenge, and I find it really helpful to see so many potential solutions collected in one document.

The creators include Nij Tontisirin and Sutee Anantsuksomsri (both since moved on from Cornell), and Jan Vink and Joe Francis (both still there). The pixellated classification map is based on work by Nicholas Nagle.

For more about mapping uncertainty, see their paper:

Francis, J., Tontisirin, N., Anantsuksomsri, S., Vink, J., & Zhong, V. (2015). Alternative strategies for mapping ACS estimates and error of estimation. In Hoque, N. and Potter, L. B. (Eds.), Emerging Techniques in Applied Demography (pp. 247–273). Dordrecht: Springer Netherlands, DOI: 10.1007/978-94-017-8990-5_16 [preprint]

and my related posts:

See also Visualizing Attribute Uncertainty in the ACS: An Empirical Study of Decision-Making with Urban Planners. This talk by Amy Griffin is about studying how urban planners actually use statistical uncertainty on maps in their work.

About to teach Statistical Graphics and Visualization course at CMU

I’m pretty excited for tomorrow: I’ll begin teaching the Fall 2015 offering of 36-721, Statistical Graphics and Visualization. This is a half-semester course designed primarily for students in our MSP program (Masters in Statistical Practice).

A large part of the focus will be on useful principles and frameworks: human visual perception, the Grammar of Graphics, graphic design and interaction design, and more current dataviz research. As for tools, besides base R and ggplot2, I’ll introduce a bit of Tableau, D3.js, and Inkscape/Illustrator. For assessments, I’m trying a variant of “specs grading”, with a heavy use of rubrics, hoping to make my expectations clear and my TA’s grading easier.

Di Cook, LDA and CART classification boundaries on Flea Beetles dataset

Classifier diagnostics from Cook & Swayne’s book

My initial course materials are up on my department webpage.
Here are the

  • syllabus (pdf),
  • first lecture (pdf created with Rmd), and
  • first homework (pdf) with dataset (csv).

(I’ll probably just use Blackboard during the semester, but I may post the final materials here again.)

It’s been a pleasant challenge to plan a course that can satisfy statisticians (slice and dice data quickly to support detailed analyses! examine residuals and other model diagnostics! work with data formats from rectangular CSVs through shapefiles to social networks!) … while also passing on lessons from the data journalism and design communities (take design and the user experience seriously! use layout, typography, and interaction sensibly!). I’m also trying to put into practice all the advice from teaching seminars I’ve taken at CMU’s Eberly Center.

Also, in preparation, this summer I finally enjoyed reading more of the classic visualization books on my list.

  • Cleveland’s The Elements of Graphing Data and Robbins’ Creating More Effective Graphs are chock full of advice on making clear graphics that harness human visual perception correctly.
  • Ware’s Information Visualization adds to this the latest research findings and a ton of useful detail.
  • Cleveland’s Visualizing Data and Cook & Swayne’s Interactive and Dynamic Graphics for Data Analysis are a treasure trove of practical data analysis advice. Cleveland’s many case studies show how graphics are a critical part of exploratory data analysis (EDA) and model-checking. In several cases, his analysis demonstrates that previously-published findings used an inappropriate model and reached poor conclusions due to what he calls rote data analysis (RDA). Cook & Swayne do similar work with more modern statistical methods, including the first time I’ve seen graphical diagnostics for many machine learning tools. There’s also a great section on visualizing missing data. The title is misleading: you don’t need R and GGobi to learn a lot from their book.
  • Monmonier’s How to Lie with Maps refers to dated technology, but the concepts are great. It’s still useful to know just how maps are made, and how different projections work and why it matters. Much of cartographic work sounds analogous to statistical work: making simplifications in order to convey a point more clearly, worrying about data quality and provenance (different areas on the map might have been updated by different folks at different times), setting national standards that are imperfect but necessary… The section on “data maps” is critical for any statistician working with spatial data, and the chapter on bureaucratic mapping agencies will sound familiar to my Census Bureau colleagues.

I hope to post longer notes on each book sometime later.