Category Archives: Education

Think-aloud interviews can help you write better assessments

I’m delighted to share with you a pilot study that we ran at CMU this fall, through our Teaching Statistics group. Long story short: It’s hard to write good conceptual-level questions to test student understanding, but think-aloud interviews are a very promising tool. By asking real students to talk out loud as they solve problems, you get insight into whether students give the right/wrong answers because they really do/don’t understand the problem—or whether the question should be better written. If students answer wrong because the question is ambiguous, or if they get it right using generic test-taking strategies rather than knowledge from the course, think-alouds give you a chance to detect this and revise your test questions.

Some context:
CMU’s stats department—renamed the Department of Statistics & Data Science just this fall—is revising the traditional Introductory Statistics classes we offer. Of course, as good statisticians, we’d like to gather evidence and measure whether students are learning any better in the new curriculum. We found several pre-made standardized tests of student learning in college-level Intro Stats, but none of them quite fit what we wanted to measure: have students learned the core concepts, even if they haven’t memorized traditional formulas and jargon?

We tried writing a few such multiple-choice questions ourselves, but it was quite a challenge to see past our own expert blind spots. So, we decided to get hands-on practice with assessment in the Fall 2017 offering of our graduate course on Teaching Statistics. We read Adams and Wieman (2011), “Development and Validation of Instruments to Measure Learning of Expert-Like Thinking”—who recommended using think-aloud interviews as a core part of the test-question design and validation process. This method isn’t commonly known in Statistics, although I have related experience from a decade ago when I studied design at Olin College and then worked in consumer insights research for Ziba Design. It’s been such a delight to revisit those skills and mindsets in a new context here.

We decided to run a pilot study where everyone could get practice running think-aloud interviews. With a handful of anonymous student volunteers, we ran through the process: welcome the volunteer, describe the process, give them some warm-up questions to practice thinking aloud as you solve problems, then run through a handful of “real” Intro Stats test questions and see how they tackle them. During the first pass, the interviewer should stay silent, apart from reminders like “Please remember to think out loud” if the student stops speaking. It’s not perfect, but it gets us closer to how students would really approach this question on an in-class test (not at office hours or in a study session). At the end, we would do a second pass to follow up on anything interesting or unclear, though it’s still best to let them do most of the talking: interviewers might say “I see you answered B here. Can you explain in more detail?” rather than “This is wrong; it should be C because…”

After this pilot, we feel quite confident that a formal think-aloud study will help us write questions that really measure the concepts and (mis)understandings we want to detect. The think-aloud script was drafted based on materials from Chris Nodder’s course and advice from Bowen (1994), “Think-aloud methods in chemistry education”. But there are quite a few open questions remaining about how best to implement the study. We list these on the poster above, which we presented last week at CMU’s Teaching & Learning Summit.

The current plan is to revise our protocol for the rest of Fall 2017 and design a complete formal study. Next, we will run think-alouds and revise individual questions throughout Spring 2018, then pilot and validate at the test level (which set of questions works well as a whole?) in Fall 2018, with pre- and post-tests across several sections and variations of Intro Stats.

PS — I mean no disrespect towards existing Intro Stats assessments such as CAOS, ARTIST, GOALS, SCI, or LOCUS. These have all been reviewed thoroughly by expert statisticians and educators. However, in the documentation for how these tests were developed, I cannot find any mention of think-alouds or similar testing with real students. Student testing seems limited to psychometric validation (for reliability etc.) after all the questions were written. I think there is considerable value in testing question-prototypes with students early in the development process.

PPS — Apologies for the long lack of updates. It’s been a busy year of research and paper-writing, with a particularly busy fall of job applications and interviews. But I’ll have a few more projects ready for sharing here over the next month or two.

Seminar on Active Learning pedagogy (continued)

Continuing from a while ago: in May I joined an Eberly Center reading group on the educational approach known as Active Learning (AL). Again, AL essentially just means replacing “passive” student behavior (sitting quietly in traditional lectures) with almost anything more “active.”

I’ve already described the first week, where we discussed the meaning of AL and evidence for its effectiveness. In the later two weeks, we explored how to implement a few specific AL styles.

My notes below go pretty far into the weeds, but some big-picture points: Spend more time on designing good questions & tasks (and perhaps less on your lecture notes). Ask students to put a stake in the ground (whether a carefully-prepared response or just a gut-instinct guess) before any time you lead a discussion, show a demo, or give a lecture. Teamwork (done well) has huge benefits, but make sure the assignments are designed to be done in teams (not stapling together individuals’ separate work), and teach teamwork as an explicit skill.

[OK, so last time I joked we should teach a course called Active Active Learning Learning, where we use AL pedagogy to learn about the stats/ML experimental design concept also called Active Learning. But the reverse would be fun too: Run a course on Design of Experiments, where all the experiments are about evaluating the effects of different AL-pedagogy techniques. That is to say, a good course project for Intro Stats or Design of Experiments could be to evaluate the study designs below and improve or extend them.]

Notes-to-self, from weeks 2 and 3, below the break:
Continue reading

After 7th semester of statistics PhD program

I was lucky to have research grant support and minimal TAing duties this fall, so all semester I’ve felt my research was chugging along productively. Yet I have less to show for it than last semester—I went a little too far down an unrewarding rabbit-hole. Knowing when to cut your losses is an important skill to learn!

Previous posts: the 1st, 2nd, 3rd, 4th, 5th, and 6th semesters of my Statistics PhD program.


Having defended my proposal this summer, I spent a lot of time this fall attacking one main sub-problem. Though I always felt I was making reasonable progress, eventually I discovered it to be a dead-end with no practical solution. I had wondered why nobody’s solved this problem yet; it turns out that it’s just inherently difficult, even for the simplest linear-regression case! Basically I wanted to provide finite-sample advice for a method where (1) the commonly-used approach is far from optimal but (2) the asymptotically-optimal approach is useless in finite samples. I think we can salvage parts of my work and still publish something useful, but it’ll be much less satisfying than I had hoped.

Working on a different problem, it felt encouraging to find errors in another statistician’s relevant proof: I felt like a legitimate statistician who can help colleagues notice problems and suggest improvements. On the other hand, it was also disappointing, because I had hoped to apply the proof idea directly to my own problem, and now I cannot 🙂

On a third front, my advisor invited another graduate student, Daren Wang, to help us wrap up a research project I had started in 2015 and then abandoned. Daren is bright, fast, and friendly, a pleasure to collaborate with (except when I’m despairing that it only took him a week to whiz through and improve on the stuff that took me half a year). Quite quickly, we agreed there’s no more to be done to make this project a much-better paper—so let’s just package it up now and submit to a conference. It was satisfying to work on writing and submitting a paper, one of the main skills for which I came to grad school!

Finally, I was hoping to clear up some stumbling blocks in an end-of-semester meeting with several committee members. Instead, our meeting raised many fascinating new questions & possible future directions… without wrapping up any loose ends. Alas, such is research 🙂


As I’ve noted before, I audited Jordan Rodu’s Deep Learning course. I really liked the journal-club format: Read a paper or two for every class session. Write a short response before class, so the instructor can read them first. Come prepared to discuss and bring up questions of your own. I wish more of our courses were like this—compared to lecture, it seems better for the students and less laborious for the instructor.

Although it was a theory course, not hands-on, I did become intrigued enough by one of the papers to try out the ideas myself. Together with classmate Nicolas Kim, we’re playing around with Keras on a GPU to understand some counterintuitive ideas a little better. Hopefully we’ll have something to report in a couple of weeks.

I also started to audit Kevin Kelly’s undergrad and grad-level courses on Epistemology (theory of knowing). Both were so fascinating that I had to drop them, else I would have done all the course readings at the expense of my own research 🙂 but I hope to take another stab someday. One possibly-helpful perspective I got, from my brief exposure to Epistemology, was a new-to-me (caricatured) difference between Bayesian and classical statistics.

  • Apparently most philosophy-of-science epistemologists are Bayesian. They posit that a scientist’s work goes like this: You are given a hypothesis, some data, and some prior knowledge or belief about the problem. How should we use the data to update our knowledge/belief about that hypothesis? In that case, obviously, Bayesian updating is a sensible way to go.
  • But I disagree with the premise. Often, a scientist’s work is more like this: You’re not handed a hypothesis or a dataset, but must choose them yourself. You also know your colleagues will bicker over claims of prior knowledge. If you come up with an interesting question, what data should you collect so that you’ll most likely find a strong answer? That is, an answer that most colleagues will find convincing regardless of prior belief, and that will keep you from fooling yourself? This is the classical / frequentist setting, which treats design (of a powerful, convincing experiment / survey / study) as the heart of statistics. In other words, you’re not merely evaluating “found” data—your task is to choose a design in hopes of making a convincing argument.

Other projects

Some of my cohort-mates and I finally organized a Dissertation Writing Group, a formal setting to talk shop technically with other students whose advisors don’t already hold research-group meetings. I instigated this selfishly, wanting to have other people I can pester with theory questions or simply vent with. But my fellow students agreed it’s been useful to them too. We’re also grateful to our student government for funding coffee and snacks for these meetings.

I did not take on other new side projects this fall, but I’ve stayed in touch with former colleagues from the Census Bureau still working on assessing & visualizing uncertainty in estimate rankings. We have a couple of older reports about these ideas. We still hope to publish a revised version, and we’re working on a website to present some of the ideas interactively. Eventually, the hope is to incorporate some of this into the Census website, to help statistical-novice data users understand that estimates and rankings come with statistical uncertainty.

Finally, I heard about (but have not attended) CMU’s Web Dev Weekend. I really like the format: a grab-bag of 1- or 2-hour courses, suitable for novices, that get you up and running with a concrete project and a practical skill you can take away. Can we do something similar for statistics?

Topic ideas where a novice could learn something both interesting and
useful in a 1.5h talk:

  • How not to fool yourself in A/B testing (basic experimental design and power analysis)
  • Befriend your dataset (basic graphical and numerical EDA, univariate and bivariate summaries, checking for errors and outliers)
  • Plus or minus a bit (estimating margins of error—canned methods for a few simple problems, intro to bootstrap for others)
  • Black box white belt (intro to some common data mining methods you might use as baselines in Kaggle-like prediction problems)

Many of these could be done with tools that are familiar (Excel) or novice-friendly (Tableau), instead of teaching novices to code in R at the same time as they learn statistical concepts. This would be a fun project for a spring weekend, in my copious spare time (hah!)


Offline, we are starting to make some parent friends through daycare and playgrounds. I’m getting a new perspective on why parents tend to hang out with other parents: it’s nice to be around another person who really understands the rhythm of conversation when your brain is at best a quarter-present (half-occupied by watching kid, quarter-dysfunctional from lack of sleep). On the other hand, it’s sad to see some of these new friends moving away already, leaving the travails of academia behind for industry (with its own new & different travails but a higher salary).

So… I made the mistake of looking up average salaries myself. In statistics departments, average starting salaries for teaching faculty are well below starting salaries for research faculty. In turn, research faculty’s final salary (after decades of tenure) is barely up to the starting salaries I found for industry Data Scientists. Careers are certainly not all about the money, but the discrepancies were eye-opening, and they are good to know about in terms of financial planning going forward. (Of course, those are just averages, with all kinds of flaws. Particularly notable is the lack of cost-of-living adjustment, if a typical Data Scientist is hired in expensive San Francisco while typical teaching faculty are not.)

But let’s end on a high note. Responding to a question about which R / data science blogs to follow, Hadley Wickham cited this blog! If a Hadley citation can’t go on a statistician’s CV, I don’t know what can 🙂

“Sound experimentation was profitable”

Last time I mentioned some papers on the historical role of statistics in medicine. Here they are, by Donald Mainland:

  • “Statistics in Clinical Research: Some General Principles” (1950) [journal, pdf]
  • “The Rise of Experimental Statistics and the Problems of a Medical Statistician” (1954) [journal, pdf]

I’ve just re-read them and they are excellent. What is the heart of statistical thinking? What are the most critical parts of (applied) statistical education? At just 8-9 pages each, they are valuable reading, especially as a gentle rejoinder in this age of shifting fashions around Data Science, concerns about the replicability crisis, and misplaced hopes that Big Data will fix everything painlessly.

Some of Mainland’s key points, with which I strongly agree:

  • The heart of statistical thinking concerns data design, even more so than data analysis. How should we design the study (sampling, randomization, power, etc.) in order to gather strong evidence and to avoid fooling ourselves?

    …the methods of investigating variation are statistical methods. Investigating variation means far more than applying statistical tests to data already obtained. … Statistical ideas, to be effective, must enter at the very beginning, i.e., in the planning of an investigation.

  • Whenever possible, a well-designed experiment is highly preferred over poorly-designed experimental or observational data. It’s stronger evidence… and, as industry has long recognized, it cuts costs.

    In all the applied sciences, inefficient or wrong methods of research or production cause loss of money. Therefore, sound experimentation was profitable; and so applied chemistry and physics adopted modern biological statistics while academic chemists, physicists, and even biologists were disregarding the revolution or resisting it, largely through ignorance.

  • Yes, of course you can apply statistical methods to “found” data. Sometimes you have no alternative (macroeconomics; data journalism); sometimes it’s just substantially cheaper (Big Data). But if you gather haphazard data and merely run statistical tests after the fact, you’re missing the point.

    These unplanned observations may be the only information available as a basis for action, and they may form a useful basis for planned experiments; but we should never forget their inferior status.

    …a significance test has no useful meaning unless an experiment has been properly designed.

  • Statistical education for non-statisticians spends too little time on good data design, and too much on a slew of cookbook formulas and tests.

    …the increase in the incidence of tests—statistical arithmetic—has continued, and so also, very commonly, has the disregard of the more important contribution of statistics, the principles and methods of sound, economical experimentation and valid inference… Another obvious cause is the common human tendency to use gadgets instead of thought. Here the gadgets are the arithmetical techniques, and the statistical “cookbooks” that have presented these techniques most lucidly, without primary emphasis on experimentation and logic, have undoubtedly done much harm.

  • Statistical education for actual applied statisticians also spends too little time on good data design, and too much on mathematics.

    The most important single element in the training (and continuous education) of any statistician is practical experience—experience of investigations for which he himself is responsible, with all their difficulties and disappointments.

    …even if a mathematician specializes in the statistical branch of mathematics, he is not thereby fitted to give guidance in the application of the methods.

  • As an investigator, you must understand statistical reasoning yourself. You can (and should!) hire an applied statistician to help with the details of study design and data analysis, but you must understand their viewpoint to benefit from their help.

    If, however, he is acquainted with the requirements for valid proof, he will often see that what looked like evidence is not evidence at all…

Of course study design is not all of statistics. But it’s a hugely important component that seems underappreciated in modern statistics curricula (at least in my experience). Even if it’s not the sexiest area of current research, I’m surprised my PhD program at CMU completely omits it from our education. (The BS and MS programs here do offer one course each. But I was offered much deeper courses in my MS at Portland State, covering design of experiments and also of survey samples.)

As a bonus, Mainland also offers advice on starting and running a statistical consulting unit. It’s aimed at medical scientists but useful more broadly.

I would quote more, but you should really just read the whole thing. Then comment to tell me why I’m wrong 🙂

After 6th semester of statistics PhD program

Posting far too late again, but here’s what I remember from last Spring…

This was my first semester with no teaching, TAing, or classes (besides one I audited for fun). As much as I enjoy these things, research has finally gone much faster and smoother with no other school obligations. The fact that our baby started daycare also helped, although it’s a bittersweet transition. At the end of the summer I passed my proposal presentation, which means I am now ABD!

Previous posts: the 1st, 2nd, 3rd, 4th, and 5th semesters of my Statistics PhD program.

Thesis research and proposal

During 2015, most of my research with my advisor, Jing Lei, was a slow churn through understanding and extending his sparse PCA work with Vince Vu. At the end of the year I hadn’t gotten far and we decided to switch to a new project… which eventually became my proposal, in a roundabout way.

We’d heard about the concept of submodularity, which seems better known in CS, and wondered where it could be useful in Statistics as well. Das & Kempe (2011) used submodularity to understand when greedy variable selection algorithms like Forward Selection (FS, aka Forward Stepwise regression) can’t do too much worse than Best Subsets regression. We thought this approach might give a new proof of model-selection consistency for FS. It turned out that submodularity didn’t give us a fruitful proof approach after all… but also that (high-dimensional) conditions for model-selection consistency of FS hadn’t been derived yet. Hence, this became our goal: Find sufficient conditions for FS to choose the “right” linear regression model (when such a thing exists), with probability going to 1 as the numbers of observations and variables go to infinity. Then, compare these conditions to those known for other methods, such as Orthogonal Matching Pursuit (OMP) or the Lasso. Finally, analyze data-driven stopping rules for FS—so far we have focused on variants of cross-validation (CV), which is surprisingly not as well-understood as I thought.

One thing I hadn’t realized before: when writing the actual proposal, the intent is to demonstrate your abilities and preparedness for research, not necessarily to plan out your next research questions. As it turns out, it’s more important to prove that you can ask interesting questions and follow through on them. Proposing concrete “future work” is less critical, since we all know it’ll likely change by the time you finish the current task. Also, the process of rewriting everything for the paper and talk was a helpful process itself in getting me to see the “big picture” ideas in my proofs.

Anyhow, it did feel pretty great to actually complete a proof or two for the proposal. Even if the core ideas really came from my advisor or other papers I’ve read, I did do real work to pull it all together and prepare the paper & talk.

Many thanks to everyone who attended my proposal talk. I appreciated the helpful questions and discussion; it didn’t feel like a grilling for its own sake (as every grad student fears). Now it’s time to follow through, complete the research, practice the paper-submission process, and write a thesis!

The research process

When we shifted gears to something my advisor does not already know much about, it helped me feel much more in charge and productive. Of course, he caught up and passed me quickly, but that’s only to be expected of someone who also just won a prestigious NSF CAREER award.

Other things that have helped: Getting the baby into day care. No TAing duties to divide my energy this semester. Writing up the week’s research notes for my advisor before each meeting, so that (1) the meetings are more focused & productive and (2) I build up a record of notes that we can copy-paste into papers later. Reading Cal Newport’s Deep Work book and following common-sense suggestions about keeping a better schedule and tracking better metrics. (I used to tally all my daily/weekly stats-related work hours; now I just tally thesis hours and try to hit a good target each week on those alone, undiluted by side stuff.)

I’m no smarter, but my work is much more productive, I feel much better, and I’m learning much more. Every month I look back and realize that, just a month ago, I’d have been unable to understand the work I’m doing today. So it is possible to learn and progress quite quickly, which makes me feel much better about this whole theory-research world. I just need to immerse myself, spend enough time, revisit it regularly enough, have a concrete research question that I’m asking—and then I’ll learn it and retain it far better than I did the HWs from classes I took.

Indeed, a friend asked what I’d do differently if I were starting the PhD again. I’d spend far less energy on classes, especially on homework. It feels good and productive to do HW, and being good at HW is how I got here… but it’s not really the same as making research progress. Besides, as interesting and valuable as the coursework has been, very little of it has been directly relevant to my thesis (and the few parts that were, I’ve had to relearn anyway). So I’d aim explicitly for “B equals PhD” and instead spend more time doing real research projects, wrapping them up into publications (at least conference papers). As it is, I have a pile of half-arsed never-finished class / side projects, which could instead be nice CV entries if I’d polished them instead of spending hours trying to get from a B to an A.

My advisor also pointed out that he didn’t pick up his immense store of knowledge in a class, but by reading many many papers and talking with senior colleagues. I’ve also noticed a pattern from reading a ton of papers on each of several specialized sub-topics. First new paper I encounter in an area: whoa, how’d they come up with this from scratch, what does it all mean? Next 2-3 papers: whoa, these all look so different, how will I ever follow the big ideas? Another 10-15 papers: aha, they’re actually rehashing similar ideas and reusing similar proof techniques with small variations, and I can do this too. Reassuring, but it does all take time to digest.

All that said, I still feel like a slowly-plodding turtle compared to the superstar researchers here at CMU. Sometimes it helps to follow Wondermark’s advice on how he avoided discouragement in webcomics: ignore the more-successful people already out there and make one thing at a time, for a long time, until you’ve made many things and some are even good.

(Two years in!) I had just learned the word “webcomics” from a panel at Comic-Con. I was just starting to meet other people who were doing what I was doing.

Let me repeat that: Two years and over a hundred strips in is when I learned there was a word for what I was doing.

I had a precious, lucky gift that I don’t know actually exists anymore: a lack of expectations for my own success. I didn’t know any (or very few) comic creators personally; I didn’t know their audience metrics or see how many Twitter followers they had or how much they made on Patreon. My comics weren’t being liked or retweeted (or not liked, or not retweeted) within minutes of being posted.

I had been able to just sit down and write a bunch of comics without anyone really paying attention, and I didn’t have much of a sense of impatience about it. That was a precious gift that allowed me to start finding my footing as a creator by the time anyone did notice me – when people did start to come to my site, there was already a lot of content there and some of it was pretty decent.

Such blissful ignorance is hard to achieve in a department full of high-achievers. I’ve found that stressing about the competition doesn’t help me work harder or faster. But when I cultivate patience, at least I’m able to continue (at my own pace) instead of stopping entirely.

[Update:] another take on this issue, from Jeff Leek:

Don’t compare myself to other scientists. It is very hard to get good evaluation in science and I’m extra bad at self-evaluation. Scientists are good in many different dimensions and so whenever I pick a one dimensional summary and compare myself to others there are always people who are “better” than me. I find I’m happier when I set internal, short term goals for myself and only compare myself to them.


I audited Christopher Phillips’ course Moneyball Nation. This was a gen-ed course in the best possible sense, getting students to think both like historians and like statisticians. We explored how statistical/quantitative thinking entered three broad fields: medicine, law, and sports.

Reading articles by doctors and clinical researchers, I got a context for how statistical evidence fits in with other kinds of evidence. Doctors (and patients!) find it much more satisfying to get a chemist’s explanation of how a drug “really works,” vs. a statistician’s indirect analysis showing that a drug outperforms placebo on average. Another paper confirmed for me that (traditional) Statistics’ biggest impact on science was better experimental design, not better data analysis. Most researchers don’t need to collaborate with a statistical theoretician to derive new estimators; they need an applied statistician who’ll ensure that their expensive experimental costs are well spent, avoiding confounding and low power and all the other issues.

[Update:] I’ve added a whole post on these medical articles.

In the law module, we saw how difficult it is to use statistical evidence appropriately in trials, and how lawyers don’t always find it to be useful. Of course we want our trial system to get the right answers as often as possible (free the innocent and catch the guilty), so from a purely stats view it’s a decision-theory question: what legal procedures will optimize your sensitivity and specificity? But the courts, especially trial by jury, also serve a distinct social purpose: ensuring that the legal decision reflects and represents community agreement, not just isolated experts who can’t be held accountable. When you admit complicated statistical arguments that juries cannot understand, the legal process becomes hard to distinguish from quack experts bamboozling the public, which undermines trust in the whole system. That is, you have the right to a fair trial by a jury of your peers; and you can’t trample on that right in order to “objectively” make fewer mistakes. (Of course, this is also an argument for better statistical education for the public, so that statistical evidence becomes less abstruse.)

[Update:] In a bit more detail, “juries should convict only when guilt is beyond reasonable doubt. …one function of the presumption of innocence is to encourage the community to treat a defendant’s acquittal as banishing all lingering suspicion that he might have been guilty.” So reasonable doubt is meant to be a fuzzy social construct that depends on your local community. If trials devolve into computing a fungible “probability of guilt,” you lose that specificity / dependence on local community, and no numerical threshold can truly serve this purpose of being “beyond a reasonable doubt.” For more details on this ritual/pageant view of jury trials, along with many other arguments against statistics in the courtroom, see (very long but worthwhile) Tribe (1971), “Trial by Mathematics: Precision and Ritual in the Legal Process” [journal, pdf].

[Note to self: link to some of the readings described above.]

Next time I teach I’ll also use Prof. Phillips’ trick for getting to know students: require everyone to sign up for a time slot to meet in his office, in small groups (2-4 people). This helps put names to faces and discover students’ interests.

Other projects

I almost had a Tweet cited in a paper 😛 Rob Kass sent along to the department an early draft of “Ten Simple Rules for Effective Statistical Practice” which cited one of my tweets. Alas, the tweet didn’t make it into the final version, but the paper is still worth a read.

I also attended the Tapestry conference in Colorado, presenting course materials from the Fall 2015 dataviz class that I taught. See my conference notes here and here.

Even beyond that, it’s been a semester full of thought on statistical education, starting with a special issue in The American Statistician (plus supplementary material). I also attended a few faculty meetings in our college of humanities and social sciences, to which our statistics department belongs. They are considering future curricular revisions to the general-education requirements. What should it mean to have a well-rounded education, in general and specifically at this college? These chats also touch on the role of our introductory statistics course: where should statistical thinking and statistical evidence fit into the training of humanities students? This summer we started an Intro Stats working group for revising our department’s first course; I hope to have more to report there eventually.

Finally, I TA’ed for our department’s summer undergraduate research experience program. More on that in a separate post.


My son is coordinated enough to play with a shape-sorter, which is funny to watch. He gets so frustrated that the square peg won’t go in the triangular hole, and so incredibly pleased when I gently nudge him to try a different hole and it works. (Then I go to meet my advisor and it feels like the same scene, with me in my son’s role…)

He’s had many firsts this spring: start of day care, first road trip, first time attending a wedding, first ER visit… Scary, joyful, bittersweet, all mixed up. It’s also becoming easier to communicate, as he can understand us and express himself better; but he also now has preferences and insists on them, which is a new challenge!

I’ve also joined some classmates in a new book club. A few picks have become new favorites; others really put me outside my comfort zone in reading things I’d never choose otherwise.

Seminar on Active Learning pedagogy

I’ve joined an Eberly Center seminar / reading group on the educational approach known as Active Learning (AL).

[Obviously our Statistics department should use Active Learning, the educational technique, to teach a course on Active Learning, the approach to experimental design. We can call it Active Active Learning Learning.]

The basic idea in AL is to replace traditional lectures and “passive” student behavior (listening quietly to an instructor speak, perhaps taking notes to transcribe the lecture) with almost anything more “active”: discussion with your neighbor or the whole class; short clicker quizzes; labs or larger projects; etc. The goal is to help your students stay attentive, motivated, and engaged, so they are constructing and synthesizing knowledge throughout class—instead of just being passive receptacles for the words you say.

I’ve tried variants of AL when teaching, and generally I liked the outcomes, but I hope the reading group will help me think through some challenges I had in implementation.

Notes-to-self below, posted in case any readers have thoughts or suggestions:
Continue reading

After 5th semester of statistics PhD program

Better late than never—here are my hazy memories of last semester. It was one of the tougher ones: an intense teaching experience, attempts to ratchet up research, and parenting a baby that’s still too young to entertain itself but old enough to get into trouble.

Previous posts: the 1st, 2nd, 3rd, and 4th semesters of my Statistics PhD program.


I’m past all the required coursework, so I only audited Topics in High Dimensional Statistics, taught by Alessandro Rinaldo as a pair of half-semester courses (36-788 and 36-789). “High-dimensional” here loosely means problems where you have more variables (p) than observations (n). For instance, in genetic or neuroscience datasets, you might have thousands of measurements each from only tens of patients. The theory here is different than in traditional statistics because you usually assume that p grows with n, so that getting more observations won’t reduce the problem to a traditional one.

This course focused on some of the theoretical tools (like concentration inequalities) and results (like minimax bounds) that are especially useful for studying properties of high-dimensional methods. Ale did a great job covering useful techniques and connecting the material from lecture to lecture.

In the final part of the course, students presented recent minimax-theory papers. It was useful to see my fellow students work through how these techniques are used in practice, as well as to get practice giving “chalk talks” without projected slides. I gave a talk too, preparing jointly with my classmate Lingxue Zhu (who is very knowledgeable, sharp, and always great to work with!) Ale’s feedback on my talk was that it was “very linear”—I hope that was a good thing? Easy to follow?

Also, as in every other stats class I’ve had here, we brought up the curse of dimensionality—meaning that, in high-dimensional data, very few points are likely to be near the joint mean. I saw a great practical example of this in a story about the US Air Force’s troubles designing fighter planes for the “average” pilot.


I taught a data visualization course! Check out my course materials here. There’ll be a separate post reflecting on the whole experience. But the summer before, it was fun (and helpful) to binge-read all those dataviz books I’ve always meant to read.

I’ve been able to repurpose my lecture materials for a few short talks too. I was invited to present a one-lecture intro to data viz for Seth Wiener‘s linguistics students here at CMU, as well as for a seminar on Data Dashboard Design run by Matthew Ritter at my alma mater (Olin College). I also gave an intro to the Grammar of Graphics (the broader concept behind ggplot2) for our Pittsburgh useR Group.


I’m officially working with Jing Lei, still looking at sparse PCA but also some other possible thesis topics. Jing is a great instructor, researcher, and collaborator working on many fascinating problems. (I also appreciate that he, too, has a young child and is understanding about the challenges of parenting.)

But I’m afraid I made very slow research progress this fall. A lot of my time went towards teaching the dataviz course, and plenty went to parenthood (see below), both of which will be reduced in the spring semester. I also wish I had some grad-student collaborators. I’m not part of a larger research group right now, so meetings are just between my advisor and me. Meetings with Jing are very productive, but in between it’d also be nice to hash out tough ideas together with a fellow student, without taking up an advisor’s time or stumbling around on my own.

Though it’s not quite the same, I started attending the Statistical Machine Learning Reading Group regularly. Following these talks is another good way to stretch my math muscles and keep up with recent literature.


As a nice break from statistics, we got to see our friends Bryan Wright and Yuko Eguchi both defend their PhD dissertations in musicology. A defense in the humanities seems to be much more of a conversation involving the whole committee, vs. the lecture given by Statistics folks defending PhDs.

Besides home and school, I’ve been a well-intentioned but ineffective volunteer, trying to manage a few pro bono statistical projects. It turns out that virtual collaboration, managing a far-flung team of people who’ve never met face-to-face, is a serious challenge. I’ve tried reading up on advice but haven’t found any great tips—so please leave a comment if you know any good resources.

So far, I’ve learned that choosing the right volunteer team is important. Apparent enthusiasm (I’m eager to have a new project! or even eager for this particular project!) doesn’t seem to predict commitment or followup as well as apparent professionalism (whether or not I’m eager, I will stay organized and get s**t done).

Meanwhile, the baby is no longer in the “potted-plant stage” (when you can put him down and expect he’ll still be there a second later), but not yet in day care, while my wife is returning to part-time work. After this semester, we finally got off the wait-lists and into day care, but meanwhile it was much harder to juggle home and school commitments this semester.

However, he’s an amazing little guy, and it’s fun finally taking him to outings and playdates at the park and zoo and museums (where he stares at the floor instead of exhibits… except for the model railroad, which he really loved!) We also finally made it out to Kennywood, a gorgeous local amusement park, for their holiday light show.

Here’s to more exploration of Pittsburgh as the little guy keeps growing!

Lunch with ASA president Jessica Utts

The president of the American Statistical Association, Jessica Utts, is speaking tonight at the Pittsburgh ASA Chapter meeting. She stopped by CMU first and had lunch with us grad students here.


First of all, I recommend reading Utts’ Comment on statistical computing, published 30 years ago. She mentioned a science-fiction story idea about a distant future (3 decades later, i.e. today!) in which statisticians are forgotten because everyone blindly trusts the black-box algorithm into which we feed our data. Of course, at some point in the story, it fails dramatically and a retired statistician has to save the day.
Utts gave good advice on avoiding that dystopian future, although some folks are having fun trying to implement it today—see for example The Automatic Statistician.
In some ways, I think that this worry (of being replaced by a computer) should be bigger in Machine Learning than in Statistics. Or, perhaps, ML has turned this threat into a goal. ML has a bigger culture of Kaggle-like contests: someone else provides data, splits it into training & test sets, asks a specific question (prediction or classification), and chooses a specific evaluation metric (percent correctly classified, MSE, etc.) David Donoho’s “50 years of Data Science” paper calls this the Common Task Framework (CTF). Optimizing predictions within this framework is exactly the thing that an Automatic Statistician could, indeed, automate. But the most interesting parts are the setup and interpretation of a CTF—understanding context, refining questions, designing data-collection processes, selecting evaluation metrics, interpreting results… All those fall outside the narrow task that Kaggle/CTF contestants are given. To me, such setup and interpretation are closer to the real heart of statistics and of using data to learn about the world. It’s usually nonsensical to even imagine automating them.

Besides statistical computing, Utts has worked on revamping statistics education more broadly. You should read her rejoinder to George Cobb’s article on rethinking the undergrad stats curriculum.

Utts is also the Chief Reader for grading the AP Statistics exams. AP Stats may need to change too, just as the undergraduate stats curriculum is changing… but it’s a much slower process, partly because high school AP Stats teachers aren’t actually trained in statistics the way that college and university professors are. There are also issues with computer access: even as colleges keep moving towards computer-intensive methods, in practice it remains difficult for AP Stats to assess fairly anything that can’t be done on a calculator.

Next, Utts told us that the recent ASA statement on p-values was inspired as a response to the psychology journal, BASP, that banned them. I think it’s interesting that the statement is only on p-values, even though BASP actually banned all statistical inference. Apparently it was difficult enough to get consensus on what to say about p-values alone, without agreeing on what to say about alternatives (e.g. publishing intervals, Bayesian inference, etc.) and other related statistical concepts (especially power).

Finally, we had a nice discussion about the benefits of joining the ASA: networking, organizational involvement (it’s good professional experience and looks good on your CV), attending conferences, joining chapters and sections, getting the journals… I learned that the ASA website also has lesson plans and teaching ideas, which seems quite useful. National membership is only $18 a year for students, and most local chapters or subject-matter sections are cheap or free.

The ASA has also started a website for helping journalists understand, interpret, and report on statistical issues or analyses. If you know a journalist, tell them about this resource. If you’re a statistician willing to write some materials for the site, or to chat with journalists who have questions, go sign up.

Tapestry 2016 materials: LOs and Rubrics for teaching Statistical Graphics and Visualization

Here are the poster and handout I’ll be presenting tomorrow at the 2016 Tapestry Conference.

Poster "Statistical Graphics and Visualization: Course Learning Objectives and Rubrics"

My poster covers the Learning Objectives that I used to design my dataviz course last fall, along with the grading approach and rubric categories that I used for assessment. The Learning Objectives were a bit unusual for a Statistics department course, emphasizing some topics we teach too rarely (like graphic design). The “specs grading” approach1 seemed to be a success, both for student motivation and for the quality of their final projects.

The handout is a two-sided single page summary of my detailed rubrics for each assignment. By keeping the rubrics broad (and software-agnostic), it should be straightforward to (1) reuse the same basic assignments in future years with different prompts and (2) port these rubrics to dataviz courses in other departments.

I had no luck finding rubrics for these learning objectives when I was designing the course, so I had to write them myself.2 I’m sharing them here in the hopes that other instructors will be able to reuse them—and improve on them!

Any feedback is highly appreciated.


PolicyViz episode on teaching data visualization

When I was still in DC, I knew Jon Schwabish’s work designing information and data graphics for the Congressional Budget Office. Now I’ve run across his podcast and blog, PolicyViz. There’s a lot of good material there.

I particularly liked a recent podcast episode that was a panel discussion about teaching dataviz. Schwabish and four other experienced instructors talked about course design, assignments and assessment, how to teach implementation tools, etc.

I recommend listening to the whole thing. Below are just notes-to-self on the episode, for my own future reference.

Continue reading