Monthly Archives: September 2012

Superheroes? Dataheroes!

Jake Porway of DataKind gave an inspiring talk comparing statisticians and data scientists to superheroes. Hear the story of how “the data scientists, statisticians, analysts were able to bend data to their will” and how these powers are being used for good or for awesome:

(Hat Tip:

Jake’s comment that “you have extraordinary powers that ordinary people don’t have” reminds me of Andrew Gelman’s suggestion that “The next book to write, I guess, should be called, not Amazing Numberrunchers or Fabulous Stat-economists, but rather something like Statistics as Your Very Own Iron Man Suit.

Links to the statistics / data science volunteering opportunities Jake mentioned:

I also recommend Statistics Without Borders, with more of an international health focus. And if you’re here in Washington DC, Data Community DC and the related meetups are a great resource too.

Edit: Current students could also see if there is a Statistics in the Community (StatCom) Network branch at their university.

Statistics contests

Are you familiar with Kaggle? It’s a website for hosting online data-analysis contests, like smaller-scale versions of the Netflix Prize contest.
The U.S. Census Bureau is now hosting a Kaggle contest, asking statisticians and data scientists to help predict mail return rates on surveys and census forms (more info at and The ability to predict return rates will help the Census Bureau target its outreach efforts and interview followup (phone calls and door-to-door interviews) more efficiently. So you could win a prize and make the government more efficient, all at the same time! 🙂
The contest ends on Nov 1st, so you still have 40 days to compete.

If you prefer making videos to crunching numbers, there’s also a video contest to promote the International Year of Statistics for 2013. Help people see how statistics makes the world better, impacts current events, or gives you a fun career, and you may win a prize and be featured on their website all next year. There are special prizes among non-English-language videos and among entrants under 18 years old.
Submissions are open until Oct 31st, just a day before the Census Challenge.


Compensating for different spatial abilities (feat. cyborgs!)

In July, I saw Iowa State’s  Dr. Sarah Nusser give a presentation about spatial ability among survey field representatives and how different people interact with various geospatial technologies. This talk introduced an area of research quite new to me, and it reminded me how important it is to know your audience before designing products for them. It also touched on directly augmenting our sensory perception — more about that below.

When you hire people to collect survey data in the field (verify addresses, conduct interviews, assess land cover type, etc.), you hope they’ll be able to find their way to the sites where you’re sending them. But new hires might come in with various levels of skill or experience, as well as different mental models for maps and geography. Dr. Nusser’s work [here’s a representative article] frames this as “spatial ability” and, practically speaking, treats it as innate: rather than training adults to improve their spatial ability, she focuses on technology and interfaces that help them work better with the mental model they already have. (I can’t believe that spatial ability really is innate and static… but it’s probably cheaper to design a few user-targeted interfaces once than to train new hires indefinitely.)

How do you tell if someone has high or low spatial ability (high SA vs low SA)? One approach is the Paper Folding Test and related tests produced by the Educational Testing Service.

Where will the holes be when the paper is unfolded?

Continue reading

The Grammar of Graphics: notes on first reading

Leland Wilkinson’s The Grammar of Graphics is a classic in the data visualization literature. Wilkinson created a framework that coherently ties together many aspects of designing, implementing, reading, and understanding a graphic. It’s a useful approach and has been fairly influential: The popular R package ggplot2 is, more or less, an implementation of Wilkinson’s ideas, and I also see their influence in the software Tableau (about which more another time). Wilkinson himself helped to build these ideas into SPSS’s Graphics Production Language (GPL) and then SPSS Visualization Designer.

So what’s so special here? One of the core ideas is to start with the raw data and think about all the transformations, summaries, etc. that go into graphing it. With a good framework, this can help us see connections between different graphs and create new ones. (The opposite extreme is a “typology” or list of graph types, like you get in Excel: do you want a bar chart, a pie chart, a line chart, or one of these other 10 types? Such a list has no deep structure.) Following Wilkinson’s approach, you’ll realize that a pie chart is basically just a stacked bar chart plotted in polar coordinates, with bar height mapped to pie-slice angle… and that can get you thinking: What if I mapped bar height to radius, not angle? What if I added a variable and moved to spherical coordinates? What if I put a scatterplot in polar coordinates too? These may turn out to be bad ideas, but at least you’re thinking — in a way that is not encouraged by Excel’s list of 10 graph types.

This is NOT the approach that Wilkinson takes.

But, of course, thinking is hard, and so is this book. Reading The Grammar of Graphics requires much more of a dedicated slog than, say, Edward Tufte’s books, which you can just flip through randomly for inspiration and bite-sized nuggets of wisdom. (I admire Tufte too, but I have to admit that Wilkinson’s occasional jabs at Tufte were spot-on and amused me to no end.) It’s a book full of wit and great ideas, but also full of drawn-out sections that require serious focus, and it takes a while to digest it all and put it together in your mind.

So, although I’d highly recommend this book to anyone deeply interested in visualization, I’m still digesting it. What follows is not a review but just notes-to-self from my first read-through: things to follow up on and for my own reference. It might not be particularly thrilling for other readers. Continue reading