Greetings from lovely San Diego, CA, site of this year’s Joint Statistical Meetings. I can’t believe it’s already been a year since I was inspired to start blogging during the JSM in Miami!
If you’re keeping tabs on this year’s conference, there’s a fair amount of #JSM2012 activity on Twitter. Sadly, I haven’t seen any recent posts on The Statistics Forum, which blogged JSM so actively last year.
Yesterday’s Dilbert cartoon was also particularly fitting for the start of JSM, with its focus on big data
“Imagine someone who loves you sitting in row 8.”
I registered at the huge San Diego convention center on Sunday afternoon. I began with the speaker skills workshop and picked up some great tips on giving a top-notch presentation. For example, to help the audience think about what to ask you, leave your Conclusions slide on the screen at the end during questions, rather than a content-free “Questions?” slide.
William Li had good advice for non-native English speakers but also pointed out that this can be an advantage: as you speak more slowly, you’re less likely to rush your talk. However, he advised JSM speakers to spend less time on introductions and get to the point quickly: what have you done that is new, and how does it compare to existing work?
Dick DeVeaux reminded us not to show every detail of our work. He quoted a student who told him, “My department brings in experts that give talks that are impossible to understand, and I feel pressure to do the same…” But we should think about why the audience are really there, and remember that they want us to succeed in giving an accessible talk. Most of the audience wants an overview that explains why your approach is worth trying; interested parties can ask you for the exact details later.
The speakers also pointed us to some solid presentation guidelines on the ASA website.
“Many graphics published nowadays would make very nice wallpaper.”
After lunch came the statistical graphics panel session with some of the most respected folks in this field. Michael Kane is more of a graphics power user than a guru, but he started off the panel with a good overview. His research experiences at Kodak made it clear (and as anyone who’s worked on robot/computer vision knows) that it’s hard to emulate the power of our visual system for processing data; we should use good statistical graphics to harness it whenever possible.
Antony Unwin shared anecdotes about having a single color printer on campus in the early ’80s and having to run across campus to pick up the printouts… Now that graphics are far easier to make nowadays, he hopes we’ll make better use of them too. The limitless space online means that instead of deciding which one graph is best to include in an article, we have the new problem of deciding the best way to organize 100 graphs. He then showed several winners from the data journalism awards and highlighted that although the creators did great work in collecting and preparing the data, their visualizations could be so much better if they helped users make connections across observations or data subsets, rather than only drilling down into the details of one subset at a time. We need to encourage better use of data linking, aggregate statistical summaries, appropriate base data for making comparisons, and “graphics ensembles” (telling a story via several distinct graphics). It may be easier to teach the key statistical concepts to a journalist with little technical background, who will take it from you on trust, than to someone who’s had the dangerous Stats 101 and assumes you need to get mired in the technicalities… Unwin recommended David Moore’s Statistics: Concepts and Controversies for this purpose.
Hadley Wickham shared a century and a half of great statistical graphics examples (including some beauties from the 1870 U.S. Census atlas)… and also historical examples of the same criticism of poor graphics that we hear today. In 1901 statisticians were already giving advice to avoid using area size for visual comparisons, etc. In 1951 Kenneth Haemer complained about people making 3D bar charts by hand! (None of the panelists could understand why people would put in this much effort for such a bad graphic — it was so much more difficult than making 2D bar charts without Excel do it for you!) Overall, it’s impressive that Funkhouser’s 1937 article already had enough statistical graphics to provide a historical overview and solid advice, yet we still have not distributed that advice well among today’s creators of graphics.
Finally, Leland Wilkinson told us about developments of software for “scagnostics,” John Tukey’s term for scatterplot diagnostics (i.e., look at the “clumpiness” of the points in a scatterplot and compare it to a distribution of possible clumpiness scores)… and “CHIRP,” a classification algorithm modeled on human visual processing, which beat out the other top contenders in recent classification contests. I was particularly interested in his examples of interactive matrix of scatterplots which “zoomed” on mouseover, distorting the matrix so that the scatterplot under the mouse is largest — it seems like a good way to show massive amounts of data at once and let the user quickly zoom in for more details. A related example was a linked timeseries of scatterplots, where mousing over one point in one scatterplot would overlay a time series for that same unit in the scatterplots for other time points.
Coincidentally, I’ve just finished reading Wilkinson’s Grammar of Graphics, and I noticed he did not cover interaction (esp. linking and brushing) in much detail in his framework there. I asked the panelists for good resources on best practices for putting these ideas into use. Hadley suggested a few names from InfoViz and Bell Labs; Antony Unwin said that the people who do the theoretical research on this often don’t implement it in software, and vice versa, so it’s tough to find good resources.
“Guideline #1: Please call us before you collect your data”
I rounded off the afternoon at the Statistics Without Borders (SWB) sponsored session. Bryan Sayer reminded us that if you’re deciding on sample sizes for a survey and the sponsors tell you there will not be any subgroups to compare, “don’t believe them!” Cindy Weng discussed survival estimation among Afghan refugees in Pakistan, and Ryung Kim gave an overview of the mobile phone survey in Haiti after the 2010 earthquake. Jim Cochran described his recent efforts in statistical capacity building and encouraged us to take part: he’s helping to develop statistical curricula, improve effectiveness of teaching statistics, and set up teaching materials repositories in countries such as Albania, Cameroon, and Fiji. Prospective volunteers show great interest but rarely follow through, since this work often involves inconvenient timing and it may be hard to get funding: SWB helps coordinate these efforts but has no funds (or even a treasurer — this is simply not part of their charter). Finally, SWB sometimes gets requests for help analyzing data that was haphazardly collected without a statistician’s help, and it is a shame to have to tell them there’s little or nothing you can use… So please remember: friends don’t let friends collect data without a statistician’s advice
The audience members suggested that SWB could start organizing resources and suggestions for best practices for survey work in foreign cultures. So much of that research has been done in the USA or Europe, but it may not apply well to countries where the definition of a household is “people who eat from the same cooking pot” and where the age cutoff for being considered an adult is well below 18. Even when it’s tough to generalize and give specific advice, SWB could maintain a list of possible-cultural-differences to watch out for.
If you’re interested in getting involved, consider attending the SWB business meeting on Tuesday at 4:30pm, in HQ-Aqua 308.
The evening brought a great dinner out in San Diego’s downtown; meeting old friends and new at the JSM opening mixer; and the pleasant surprise of a Deschutes’ Black Butte Porter (a great Oregon beer that I’ve missed on the east coast) at the Yardhouse, which has around a hundred beers on tap.
Now that Monday is here, I’m looking forward to many sessions today but unfortunately my top picks are all at the same time! Spatial Socio-Demographics, Poverty Mapping with Complex Survey Data, and Geovisualization are all covering my areas of interest — it is a shame they are not spread apart in the schedule.