Think-aloud interviews can help you write better assessments

I’m delighted to share with you a pilot study that we ran at CMU this fall, through our Teaching Statistics group. Long story short: It’s hard to write good conceptual-level questions to test student understanding, but think-aloud interviews are a very promising tool. By asking real students to talk out loud as they solve problems, you get insight into whether students give the right/wrong answers because they really do/don’t understand the problem—or whether the question should be better written. If students answer wrong because the question is ambiguous, or if they get it right using generic test-taking strategies rather than knowledge from the course, think-alouds give you a chance to detect this and revise your test questions.

Some context:
CMU’s stats department—renamed the Department of Statistics & Data Science just this fall—is revising the traditional Introductory Statistics classes we offer. Of course, as good statisticians, we’d like to gather evidence and measure whether students are learning any better in the new curriculum. We found several pre-made standardized tests of student learning in college-level Intro Stats, but none of them quite fit what we wanted to measure: have students learned the core concepts, even if they haven’t memorized traditional formulas and jargon?

We tried writing a few such multiple-choice questions ourselves, but it was quite a challenge to see past our own expert blind spots. So, we decided to get hands-on practice with assessment in the Fall 2017 offering of our graduate course on Teaching Statistics. We read Adams and Wieman (2011), “Development and Validation of Instruments to Measure Learning of Expert-Like Thinking”—who recommended using think-aloud interviews as a core part of the test-question design and validation process. This method isn’t commonly known in Statistics, although I have related experience from a decade ago when I studied design at Olin College and then worked in consumer insights research for Ziba Design. It’s been such a delight to revisit those skills and mindsets in a new context here.

We decided to run a pilot study where everyone could get practice running think-aloud interviews. With a handful of anonymous student volunteers, we ran through the process: welcome the volunteer, describe the process, give them some warm-up questions to practice thinking aloud as you solve problems, then run through a handful of “real” Intro Stats test questions and see how they tackle them. During the first pass, the interviewer should stay silent, apart from reminders like “Please remember to think out loud” if the student stops speaking. It’s not perfect, but it gets us closer to how students would really approach this question on an in-class test (not at office hours or in a study session). At the end, we would do a second pass to follow up on anything interesting or unclear, though it’s still best to let them do most of the talking: interviewers might say “I see you answered B here. Can you explain in more detail?” rather than “This is wrong; it should be C because…”

After this pilot, we feel quite confident that a formal think-aloud study will help us write questions that really measure the concepts and (mis)understandings we want to detect. The think-aloud script was drafted based on materials from Chris Nodder’s Lynda.com course and advice from Bowen (1994), “Think-aloud methods in chemistry education”. But there are quite a few open questions remaining about how best to implement the study. We list these on the poster above, which we presented last week at CMU’s Teaching & Learning Summit.

The current plan is to revise our protocol for the rest of Fall 2017 and design a complete formal study. Next, we will run think-alouds and revise individual questions throughout Spring 2018, then pilot and validate at the test level (which set of questions works well as a whole?) in Fall 2018, with pre- and post-tests across several sections and variations of Intro Stats.

PS — I mean no disrespect towards existing Intro Stats assessments such as CAOS, ARTIST, GOALS, SCI, or LOCUS. These have all been reviewed thoroughly by expert statisticians and educators. However, in the documentation for how these tests were developed, I cannot find any mention of think-alouds or similar testing with real students. Student testing seems limited to psychometric validation (for reliability etc.) after all the questions were written. I think there is considerable value in testing question-prototypes with students early in the development process.

PPS — Apologies for the long lack of updates. It’s been a busy year of research and paper-writing, with a particularly busy fall of job applications and interviews. But I’ll have a few more projects ready for sharing here over the next month or two.