Colby is hiring a tenure-track statistician

Once again, we are hiring a tenure-track statistician here at Colby College, in the Department of Mathematics and Statistics! I should have posted this sooner; but if you are looking for an opportunity to teach Statistics and Data Science at a strong liberal arts college with great students and colleagues, please consider applying:

https://mathjobs.org/jobs/jobs/14358

We will give full consideration to applications received by this Monday, October 14, but review will continue until the position is filled. If you have concerns about making this deadline, or if you have any other questions, please do get in touch: jawieczo@colby.edu … I would also be happy to answer any questions about the position, Colby College, or our city of Waterville, Maine.

Some things that I am particularly excited about this year:

a recent multi-million-dollar gift that is funding Data Science courses, curricular development, speakers, and other activities — and strong encouragement to spend that gift on interdisciplinary projects with colleagues across campus (for example, I have co-organized a data visualization workshop with faculty in Biology and proposed a first-year course cluster to be co-taught with faculty in English);
our Office of Civic Engagement‘s support for my survey sampling course, in which we are working with the Waterville Fire Department to design and carry out a large-scale survey of city residents;
our department colloquium, full of engaging talks on mathematics and statistics, as well as our annual Runnals Dinner for Women in Mathematics — this year’s speaker is Dr Mahlet Tadesse.

“Concise Statistical Year-Book of Poland, 1939”

Eighty years ago this week, my grandmother and grandfather were each enthusiastic seven-year-olds, excited for September 1st — their first day of school! At the time, they lived hundreds of kilometers apart and had yet to meet. She had spent her childhood in France but was thrilled to be back in ancestral Poland, in the north-eastern city of Wilejka, where she would finally be able to study in Polish. He was a voracious reader in Poznań, the westernmost large city in Poland at the time. Still, both had laid out their best clothes and filled a satchel with notebooks and pens.

Of course, it was not to be. My grandfather’s mother woke him in the middle of the night and brought him quietly down to the cellar, in the dark, past windows blacked out with curtains and blankets, as German forces began shelling the city. In the morning his apartment still stood, but he saw the broken walls and ruined rooms of the building next door. Meanwhile, my grandmother’s long-awaited Polish school was cancelled as well, eventually replaced by a Russian school as Soviet forces occupied her city.

Somehow, they survived World War II and eventually met as teachers, committed to the critical importance of education in rebuilding their broken country. My grandfather went on to become a professor of history and a leading figure at the University of Zielona Góra, in the city where they finally settled (and where I was born). A few years ago, when he passed away, I found some of the old statistical yearbooks he must have used as research resources.

Worn cover of my grandfather's copy of the 1939 Concise Statistical Year-Book of Poland

The yearbook from 1939 is particularly touching. As a physical artifact, it has clearly been through a lot: worn from use, spine broken, pages torn, stamped and underlined and scribbled all over.

Title page of my grandfather's copy of the 1939 Concise Statistical Year-Book of Poland, with stamps and inked-out scribbles

But it’s the “Foreword to the 10th Edition,” written in April 1939, that really moves me with its premature optimism:

The current edition of the Year-Book closes the first ten years of its existence. Today I can emphatically assert the great utility of this publication … It remains only necessary to express a hope that the Concise Year-Book, completing currently the first decade of its existence and beginning in the near future its second decade… will continually and increasingly fulfill its mission as set out in 1930…

Once again, it was not to be. The statistical service could not continue its planned work, once the war began in September. The Polish government-in-exile in London did manage to publish a Concise Statistical Year-Book for 1939-1941, summarizing what was known about conditions in the German- and Soviet-occupied territories. But the regular annual compilation and publication of Polish statistical yearbooks did not resume until after the war, in 1947 — and even then it was interrupted again during 1951-1955 as the Soviets in charge did not want to risk revealing any state secrets.

First page of foreword to my grandfather's copy of the 1939 Concise Statistical Year-Book of Poland

Second page of foreword to my grandfather's copy of the 1939 Concise Statistical Year-Book of Poland

The Polish Wikipedia has a good article on these statistical yearbooks, but unfortunately it’s not yet translated into English. However, you can skim through a scanned PDF of the whole 1939 yearbook. For instance, the lovingly hand-drawn population density map reminds us that there were precursors to the (also beautiful) census dot maps based on 2010 US Census data.

Population density dot map from the 1939 Concise Statistical Year-Book of Poland

Now, on this 80th anniversary of the war, my own son is eager to start school, while I am preparing to bring the 1939 yearbook to my fall course on surveys and censuses. I am grateful that our life today is so much better than my grandparents’ was, even if it’s hard to be optimistic about the state of the world when you hear the news lately. All we can do is roll up our sleeves and get back to work, trying to leave the place better than we found it.

Another Pole, the poet Wisława Szymborska, said it well:

The End and the Beginning
After every war
someone has to clean up.
Things won’t
straighten themselves up, after all.

Someone has to push the rubble
to the side of the road,
so the corpse-filled wagons
can pass.

Someone has to get mired
in scum and ashes,
sofa springs,
splintered glass,
and bloody rags.

Someone has to drag in a girder
to prop up a wall,
Someone has to glaze a window,
rehang a door.

Photogenic it’s not,
and takes years.
All the cameras have left
for another war.

We’ll need the bridges back,
and new railway stations.
Sleeves will go ragged
from rolling them up.

Someone, broom in hand,
still recalls the way it was.
Someone else listens
and nods with unsevered head.
But already there are those nearby
starting to mill about
who will find it dull.

From out of the bushes
sometimes someone still unearths
rusted-out arguments
and carries them to the garbage pile.

Those who knew
what was going on here
must make way for
those who know little.
And less than little.
And finally as little as nothing.

In the grass that has overgrown
causes and effects,
someone must be stretched out
blade of grass in his mouth
gazing at the clouds.

Call for papers: NeurIPS 2019 Workshop on ML for the Developing World

I’m pleased to share a call for papers for the NeurIPS 2019 workshop on Machine Learning for the Developing World (ML4D).

This will be the 3rd year for this workshop, which brings together researchers who employ ML methods in developing-world settings, study the societal impacts of new technology, or develop algorithms to handle with common constraints in the developing world (such as limited data storage or computational power).

I was honored to take place in the 2017 workshop with a paper on “Household poverty classification in data-scarce environments: a machine learning approach”, coauthored by Varun Kshirsagar and his colleagues working on the Poverty Probability Index.
Last year’s 2018 workshop focused on achieving sustainable impact. How do you go beyond a pilot or prototype into something long-term and meaningful?
This year’s theme is around challenges and risks, particularly ethical issues such as unintended harms of deploying ML systems in developing region.

If you have relevant projects to present at the workshop, I encourage you to submit a 2-4 page short paper by September 13. The workshop will take place on Dec 13 or 14, near the end of NeurIPS 2019 in Vancouver, BC, Canada. Travel award funding is also available.

After 10th semester of statistics PhD program

FINAL BOSS BATTLE! (Warning: severe nerdiness ahead.)

Previous posts: the 1st, 2nd, 3rd, 4th, 5th, 6th, 7th, 8th, and 9th semesters of my Statistics PhD program.

Research

It’s a strange thing to finish a thesis. You came in to grad school hoping, maybe even expecting, to make a grand contribution—a deep insight into the nature of the world. But Borges puts it best:

…the tall, haughty volumes that made a golden dimness in the corner of his room were not (as his vanity had dreamed them) a mirror of the world, but just another thing added to the world’s contents.

—Jorge Luis Borges, “The Yellow Rose,” Andrew Hurley’s translation

For that matter, when do you really “finish”? The dissertation is pretty much written — then a few weeks later the slides are ready — then the defense itself takes place — then you make revisions — then there’s a graduation ceremony — and even then, there remain a few minor “suggested” revisions I haven’t done yet but should do before getting it printed & bound & archived. [That finally happened half a year later, over January break, long after the PhD was “over.”]

Not to mention, my status felt weirdly ambiguous. The faculty at Colby have already hired me and seem to treat me as basically a colleague, pending the minor formality of defending. And at CMU I’m also teaching a course and students are calling me “Professor.” But of course, everyone else in the department is used to treating me as a student, so I feel very much like one. Admittedly, during the few weeks when I was no longer a “PhD Candidate” but not yet officially “Dr Wieczorek,” I enjoyed the brief chance to introduce myself as a “Graduand” 🙂

But it’s all very anticlimactic! There’s no trumpet fanfare saying: “NOW you are done.” If anything, it felt like the first Zelda game’s end music (starting about 20 sec into the clip below):

As you’ve played the game through, you’ve died many many times along the way, and Link’s death lullaby has been playing so long in all your lowest moments that you don’t even really hear it anymore. Then here it returns, after you thought you were done. Not this again?!? Yet suddenly your ears catch a faint counterpoint that wasn’t there before. This new bass line changes the character of the music completely, getting stronger and stronger, until somehow it has become a jazzy, celebratory tune. There’s no one moment of enlightenment, but a weary transition as the world slowly starts looking better and better.

Teaching

While I didn’t finish as much of the thesis last fall as I expected, it was supposed to be my focus this spring (along with the new baby!), with only minimal grading responsibilities as a TA.

HOWEVER… a week before spring term started, a last-minute shift in staffing became necessary and I was asked to teach 36-315, Statistical Graphics and Visualization—a hundred-student undergraduate class. I really should have said No (the chair told me it’s okay to say No!), but still, the department seemed to be at a loss, and I didn’t want to let them down after all they had done for me throughout my time at CMU. So instead, I found myself focused on teaching the largest class I’ve had yet, while just scraping by on both research and parenting.

Teaching this class was a valuable experience in and of itself, and you know I LOVE teaching data visualization, but—argh!

This spring I also finished the last requirement for the Eberly Center’s Future Faculty Program. It was a bit funny to have to revise my Statement of Teaching Philosophy several times, given that I’d already written one that got me a job, but it does feel good to wrap this up formally. The program made a big impact on my teaching over the past few years, and I’m glad I can contribute to their completion rate.

Other projects

We continued our Teaching Statistics research, running many more think-aloud interviews, and presented our work at eCOTS 2018 (the Electronic Conference on Teaching Statistics).

I also traveled back to DC, to the Census Bureau, to give a talk on my paper with Tommy Wright and Martin Klein. The paper is about visualizations designed to help readers make many comparisons in a statistically-appropriate way. The talk was recorded, and now I see the video has been posted as part of a new “Census Academy” resource.

Life

The baby went through a rough patch where he wouldn’t fall sleep unless someone was holding him and moving around—sitting wasn’t good enough. Somehow I discovered that I could hold him while playing Crypt of the NecroDancer, which is basically a mashup of Dance Dance Revolution and Legend of Zelda: you use the arrow keys to move your dungeon-explorer in time with the beat of this amazing techo-ish soundtrack. I would play with one hand, standing and rocking back and forth to help myself stick to the beat, and the little guy would just fall asleep contentedly while I repeatedly ~~destroyed~~ got killed by skeletons and slime monsters.

Despite everyone’s sensible advice, we went house-hunting in Maine in mid-March—when all the houses were covered in 3 feet of snow—because spring break was the only time I could get away. Surprisingly, nobody else was house-hunting there that week 🙂 and as it turned out, we found a wonderful home at a very reasonable price. Our landlord in Pittsburgh has been wonderful, but we’re excited to become homeowners.

Comics

So, at this point you might be wondering: Is a PhD degree in Statistics worthwhile?

Someday I’ll follow up with a post on that, akin to my MS degree post. Meanwhile, Matt Might has the best post on the value of a PhD in general.

But for today, I will leave you with some comics that kept me ~~crying~~ smiling during the tough times of a PhD. Of course, PHD Comics (aka Piled Higher and Deeper) and XKCD both go without saying, but here are a few others that felt relevant.

When you’re starting down this precarious road:

When you forget why you’re even doing this:

When you have spent years with almost ~~no human company besides other academics~~ no human company:

When you are suddenly expected to interview for jobs with bona fide humans, after spending years around aforementioned academics:

When you are neeeeeearly done, and if you could just focus on your existing responsibilities you’d be able to wrap them up:

When you somehow manage to buckle down and knock out a ton of research/writing like a boss:

When you finally submit the damned dissertation:

When you are as ready as you’ll ever be for the thesis defense:

When you look back on your time as a PhD student with a note of defiant optimism:

After 9th semester of statistics PhD program

The final year! Our 2nd baby was on the way, and the job search was imminent. Oh, and by the way there’s this little thing called “finishing your dissertation”…

Previous posts: the 1st, 2nd, 3rd, 4th, 5th, 6th, 7th, and 8th semesters of my Statistics PhD program.

Research

This year my advisor was able to place me on half-TA’ing grant support. So I asked to have all my TA’ing pushed to the spring, with the intent of finishing the bulk of my thesis work this fall. I figured that in the spring I’ll be OK to grade 10 hrs/wk on my own time while helping with the new baby (due in Dec), as long as the thesis is mostly done.

However, job hunting took a HUGE amount of time. That’ll be its own post. (Also the spring TA duties turned into something larger, but that’s a story for the next post.)

In other words, thesis research progress was… minimal, to put it mildly.

Other projects

Well, OK, so there’s another reason my thesis work was slow: I jumped onto two new short-term projects that (1) were very much in my wheelhouse, and (2) seemed like great fodder for job applications and interviews. Both of them arose thanks to Ryan Tibshirani. Unsurprisingly, it turns out that it’s GREAT for a grad student to be on good terms with a well-known professor who gets too many requests for interesting projects and occasionally passes them on to students. In both cases, it was fantastic of Ryan to think of me, and although it’s been tough to be doing it just right now, this is legitimately the kind of research I want to do later on (and probably should have done for my thesis in the first place! argh…).

First, knowing that I’m interested in education, Ryan asked if I’d like to help with some consulting for Duolingo, the language-learning app company (also founded by CMU folks and still Pittsburgh-based). The folks there had some interesting questions about experimental design and suitable metrics for A/B testing their app. One of our contacts there was Burr Settles, who wrote a helpful book on Active Learning (the ML technique, not the pedagogy). We had some good discussions after digging into the details of their problem: challenges of running permutation tests on massive datasets, whether old-school tests like sign-rank tests would be better for their goals, how the data-over-time structure affects their analyses, etc. These chats even led me to an old Portland State classmate’s work regarding how to get confidence intervals from a permutation test, and I have some ideas for extensions on that particular problem.

Second, knowing that I had experience with survey data, Ryan passed on to me an email request to review some new methodology. Varun Kshirsagar, working on the Poverty Probability Index (PPI), had made some revisions to this poverty-tracking method and wanted to get feedback from someone who understood both modern statistics & machine learning tools as well as the kind of complex-survey-design datasets used to estimate the PPI models. It was a real pleasure working with Varun, and thinking about how to combine these two worlds (ML and survey data) had been a huge part of my motivation to go back to grad school in the first place and attend CMU in particular. We wrote up a short paper on the method for the NIPS 2017 workshop on ML for the Developing World, and we won the best paper award 🙂 I plan to revisit some of these ideas in future research: How do we run logistic-regression lasso or elastic net with survey-weighted data? How should we cross-validate when the survey design is not iid?

Teaching

Alex and I also continued running the Teach Stats mini. This semester we decided it was a bit embarrassing that Statistics, of all fields, doesn’t have a better standardized assessment of what students are learning in our intro courses. Without such a tool, it’s hard to do good pedagogy research and give strong evidence about whether your interventions / new methods have any impact.

There are already a couple of assessment instruments out there, but most were written by psychologists or engineers rather than by statisticians. Also, many of their questions are procedural, rather than about conceptual understanding. Even though these assessments have passed all the standard psychometric tests, there’s no guarantee that whatever they measure is actually the thing we’d *like* to measure.

So we started discussing what we’d like to measure instead, drafting questions, and planning out how we would validate these questions. Inspired by my own experiences in user-experience research at Olin and Ziba, and with the help of an article on writing good assessments for physics education by Adams and Wieman, we started planning think-aloud studies. The idea is to watch students work through our problems, talking out loud as they do so, so that we can see where they trip up. Do they get it wrong just because the question is poorly worded, even if they understand the concept? Do they get it right just because you can eliminate some choices immediately, even if they don’t understand the concept at all?

We ran a pilot study this fall, and at the end of the semester, I had my first chance to help draft an IRB application—hurrah for statisticians actually working on research design!

This summer and fall, I also kept up with an informal History of Stats reading group organized by fellow PhD student Lee Richardson. Again, these journal-clubs and reading-groups have been my favorite part of the PhD curriculum, and I wish more of our “real” classes had been run this way.

Life

?!? “Life”? Not much that I can recall this semester.

Next up

The 10th and final semester of my Statistics PhD program.

After 8th semester of statistics PhD program

I realize this is over 2 years late, but I found these drafts sitting around and perhaps they are still worth posting.

Apologies for the negativity! I have to admit this was one of the toughest semesters for me, psychologically. I thought about toning it down, especially since I’m delighted to be where I am now—tenure track faculty at a small liberal arts college—but I don’t want to pretend it’s been easy to get here.

I believe this was the semester I was going to CMU’s Counseling and Psychological Services (CaPS). If you’re a grad student, I recommend that you get to know such resources on your campus. Just about everyone faces the all-pervasive impostor syndrome, which compounds any other specific challenges you might have personally, and it’s helpful to be able to talk through it all with someone experienced.

Previous posts: the 1st, 2nd, 3rd, 4th, 5th, 6th, and 7th semesters of my Statistics PhD program.

Life

The more my son grows up, the more the PhD starts to feel like an overwhelming distraction from real life. One day I got home late and my wife told me what he’d been singing:

“Old MacDonald had a daddy, E-I-E-I-O, with a work-work here and a work-work there…” 🙁

If you have a family while in grad school, you will often feel that either the PhD doesn’t come first, or your family doesn’t come first. The academic folks around you will lean towards the PhD coming first, which of course partly makes sense—if you’re not making the fullest use out of your limited time in grad school, why bother going at all?—but it’s also hugely selection bias. Many (though not all) of the faculty who land and stay at a top research university are people who did decide that work comes first.

Thankfully, nobody has ever sneered to my face that “Ah well, not everyone’s cut out for academia” (though I’ve heard of it happening in other departments). But I feel it in my head all the time. (And I overhear younger students dismissing 30+ year olds like myself as too old to do good research… And I see the offices full every evening and weekend…) At another stage in life, my reaction might have been, “Oh yeah? I’ll show you, I’ll work my butt off and prove I’m good enough to cut it!” But now, my reaction is, “I’ve got better things to do than play this petty game.” Yes, I’ll plod along and finish the PhD I’ve started—perseverance counts for a lot—but right now I am not eager to stretch out this stage of life any longer than necessary.

Research

Reading Thinking, Fast and Slow, I am noting the constant mentions of the author’s collaboration with Amos Tversky. I think I’ve been way too focused on self-sufficiency here. It’s true that to get the thesis, I need to demonstrate I can do the work on my own… BUT in order to actually succeed afterwards (whether in academia or not), I’ll need to collaborate with others, not just myself and advisor. Plus, it’s simply more fun, spending those hours tackling a tough problem with an equally interested collaborator! So, my plan starting this summer and into next year: Add a few collaborative projects—finish FPS with Daren, finish DL with Nick and Jordan, revisit CIs/AIPE with Alex, start something (visual inference? CV with confidence?) with Justin… [Looking back, I regret that I did not follow up and make most of these planned collaborations happen while I was still a student!]
Jing still amazes me with his quick insight and intuition about how to tackle a proof. When I get stuck after spending hours on something, it takes him almost no time to see: if we back up a few steps to this other point, and tackle that instead, it’ll be much cleaner. This trait is taking me a long time to learn.
Daren argues that technical math prowess is definitely not sufficient (good ideas of *what* to study are more important), but also not necessary (even theory-wizards like Yenchi and Ale have to constantly look up things they’ve forgotten). I disagree a bit: I really think fluency in the big-picture math concepts is important—if I have memorized the definition of an eigenvalue, but not internalized its *meaning*, then I will never see big-picture ideas quickly enough, nor know how to start proving technical details, nor recall where to find those details in work I’ve seen before. It’s like when I play clarinet: I don’t need to memorize the piece I’m playing—sheet music is fine—but I *do* need to memorize the basics. I simply cannot play in real-time if I have to refer back to a fingering chart for every single note! In Joel Spolsky’s words (although see Lang’s actual article too):

Serge Lang, a math professor at Yale, used to give his Calculus students a fairly simple algebra problem on the first day of classes, one which almost everyone could solve, but some of them solved it as quickly as they could write while others took a while, and Professor Lang claimed that all of the students who solved the problem as quickly as they could write would get an A in the Calculus course, and all the others wouldn’t. The speed with which they solved a simple algebra problem was as good a predictor of the final grade in Calculus as a whole semester of homework, tests, midterms, and a final.

You see, if you can’t whiz through the easy stuff at 100 m.p.h., you’re never gonna get the advanced stuff.
It’s also hitting me how stupidly selfish I’ve been here. As much as I’d like to think so, I didn’t come here to solve practical problems and make the world a better place. If I had, I’d have started right off the bat, using (and honing) skills I *do* have, working in the Census research group to make serious progress on applied problems. Instead, I wanted to bone up on my math-theory side, just thinking about the glory of proving theorems, but without putting in the prep work. It’s true that I’ve learned a lot by working on theory problems with Jing—but I would have been so much more productive if I’d taken a few hardcore math classes first, brushing up on my weak spots *before* starting such research. (I took linear algebra over a decade before starting the PhD, and it really shows. Yes, I can multiply matrices, but the advanced stuff has been a real slog.) I’ve spent a couple of years now on work that other students could have done much faster and enjoyed more, while neglecting to make real contributions using the skills I *do* have. In other words, I wish I could go back and tell myself: *either* just take some math classes (or even get a MS in Math?), until you can do the theory on your own (or with a mentoring colleague at work), and skip the Stats PhD… *or* do the PhD in a way that builds on your strengths (and makes real contributions!), not merely papers over your weaknesses. Sadly, I probably wouldn’t have believed me. My teaching experiences and the Eberly center seminars have been wonderful, but otherwise, right now I feel I have not made good use out of my time here. (Even in my neuroscience ADA project flop, only a few logistical challenges were out of my hands—and I could have overcome most of them by gritting my teeth and learning Python well, and by sitting alongside the scientists in the lab.) Hindsight is 20/20, and everyone goes through impostor syndrome, but still…

Teaching

I was a TA for Ann Lee’s section of 36-402: Undergraduate Advanced Data Analysis, using materials developed by Cosma Shalizi.

The course largely followed Cosma’s (draft) textbook Advanced Data Analysis from an Elementary Point of View. It was good for me to be “forced” to read up a little on causal inference and related topics. I’m still no expert, but at least not clueless. I also liked his perspective of statistical modeling as “data compression,” and his view of regression as a linear smoother with *weird* weights.
Some students mentioned that having to code up cross-validation from scratch 5 times or more was a *good* part of the class. They really feel they understand it now, more so than other things which they never or rarely had to code directly—such as backfitting in GAMs. I worried that repeatedly writing CV from scratch would start to feel like busywork, but luckily not (at least for these few students). And I felt the same about taking the Convex Optimization class myself: it’s great to have repeated practice *coding up the algorithms directly* and understanding what they’re trying to do, even if it’s only practice and in reality you’d actually use a pre-canned routine that deals with subtleties such as numerical convergence issues. So, in future years, we should give more opportunities to practice coding up algorithms, not just deriving theory about them and using them for data analysis. (Not to mention the omitted issues of data collection and power calculations…)
By the end of the semester, so many students still didn’t understand the idea of additive vs interaction models. They assumed “additive model” specifically means “GAM with spline terms” and “interaction model” means “linear regression with interaction terms.” We should hit these points harder earlier: “additive” means *any* model that is additive in the terms; and you can certainly do interactions within a GAM by having a multiple-predictor spline term; and so on.
If I’m going to be strict about not accepting late HWs, I should do so from the very beginning. It’ll (hopefully) save me ages of back-and-forth emails from students with excuses over the course of the semester. Also, if the promise of no-credit-for-late-HWs only kicks in at the end of semester, all of a sudden, then some students may have already used up their free dropped-HW opportunities, so they get a much lower grade than expected even if they *do* the work (but just submit it late). That’s not *technically* unfair (the syllabus did say we’d reject late HWs)… but it *feels* unfair. Best to set up consistent and clear expectations, right?
Likewise, if going to be serious about saying that “on time” means “at the start of class,” then have a TA pick up the HWs right at that time. We saw a trickle of late students (or all showing up at end of class) dumping in HWs after the fact. (Maybe electronic submission, with the deadline enforced by your course-management software, is not so bad.)
I’m pleased that we had decent turnaround time for grading most weeks—but it was sad that so many students never bothered to pick up graded HWs. We need better incentives to figure out your mistakes and learn from them, not merely be graded on them. (Alternately, it’s tempting to say that if you *don’t* pick up X of your early HWs, then you accept “stochastic grading” for the rest—we’ll give you a random grade and save time by not grading them manually!)
The Blackboard discussion forums were painful to set up and navigate. We should have used Piazza instead.
How would I grade such a class with specs-based grading? There are so many details to demonstrate understanding of, and so many ways to lose points on current assignments. How to get around point-grubbing here?

Other projects

I made no progress on the FPS paper with Daren, nor on the DL paper with Nick 🙁 At least the FPS paper was submitted to a conference… and rejected by reviewers who didn’t understand the purpose of the paper. I should have quickly revised the introduction to reframe our goals clearly and sent it somewhere else, but instead it’s been sitting on my desk.
This semester (or maybe near the end of last term?) I volunteered to join the GenEd committee. This is a mostly-faculty committee, revising the general education requirements in the CMU college (Dietrich College of Humanities and Social Sciences) to which our Statistics department belongs. It’s been eye-opening to see how faculty meetings go behind the scenes. (In particular, it’s fascinating that even top humanities scholars at a top department have trouble concisely defending the humanities as a GenEd requirement. There are also long digressions to quibble over a single word—“competence” is like a loaded gun, and even clearly-temporary placeholders like “Off-campus experiences” hold up the meeting interminably once someone points out that undergrad research also falls under that heading and it must be renamed right now…) But seriously, it’s been great to hear some truly remarkable educators discuss the direction of our programs, from broad goals to particular pedagogical methods. As a statistician, naturally I volunteered for the Assessment subgroup, and it has been so wonderful to work with experts like Marsha Lovett.
I did well in the “Three Minute Thesis” (3MT) competition! Grad students have 3 minutes and 1 slide to summarize their work for a wide audience. I was surprised and thrilled to win my preliminary round. Unfortunately the finals were the same date as an out-of-town trip I could not miss, so I did not get to compete further, but they still kindly gave me a prelim-round prize of research funds which I used for travel to USCOTS.
I presented my dataviz course poster at the US Conference on Teaching Statistics (USCOTS), in State College, PA. During the poster session I was pleased to meet several folks from stats departments looking to hire new faculty, and they seemed pleased to meet a grad student close to defending who is interested in both pedagogy and research. I believe this was my first contact with Chris Malone at Winona State University and KB Boomer at Bucknell University.
The CMU student-run Data Science Club also invited me to give a high-level talk on principles of good data visualization. Here are my slides, a checklist of best practices to follow, R code, and the nhanes and obesity_education CSV files.
Alex Reinhart and I proposed and ran a mini (half-semester course) on Teaching Statistics, with Rebecca Nugent advising. It was run mostly as a journal club—a good chance to read and discuss interesting papers on pedagogy in general as well as (introductory) statistics education in particular.
Finally, I had my first R package RankingProject accepted on CRAN! Together with Tommy Wright and Martin Klein, my former colleagues back at the Census Bureau, I have been working on a paper about ways to visualize data that lead to correct inferences about comparing many different estimates. This package contains the code and data to implement techniques we recommend in the paper, “A Primer on Visualizations for Comparing Populations, Including the Issue of Overlapping Confidence Intervals” [which went to print in May 2019 in The American Statistician]. The package also was listed as one of RStudio’s top 40 new packages in March 2017, and it has had a respectable number of downloads so far: total , with a monthly average of .

Next up

The 9th and 10th semesters of my Statistics PhD program.

Meng’s “Big Data Paradox” and an extreme example

Update (December 2021): Welcome, new readers. I’m seeing an uptick in visits to this post, probably due to the Nature paper that was just published: “Unrepresentative big surveys significantly overestimated US vaccine uptake” (Bradley et al., 2021). I’ve added a short post with a bit more about the Bradley paper. But the main point is that I strongly encourage you to also read Frauke Kreuter’s brief “What surveys really say,” which describes the context for all of this and points to some of the research challenges needed in order to move ahead; and Reinhart & Tibshirani’s “Big data, big problems,” the Delphi-Facebook survey team’s response to (an earlier draft of) Bradley et al. That said, I hope my 2018 post below is still a useful glimpse at the Big Data Paradox idea.

Original post (October 2018): Xiao-Li Meng’s recent article on “Statistical Paradises and Paradoxes in Big Data (I)” has a fascinating approach to quantifying uncertainty in the Big Data setting. Here’s a very loose/oversimplified summary, along with an extreme example of applying some of these ideas. But I certainly encourage you to read the original paper itself.

(Hat tip: Lee Richardson, and CMU’s History of Statistics reading group.)

We exhort our Intro Stats students: Don’t trust data from a biased sample! Yet in the real world, we rarely deal with datasets collected under a formal random sampling scheme. Sometimes we have e.g. administrative data collected from many people but only those who self-selected into the sample. Even when we do take a “proper” random survey, there is usually nonignorable non-response.

Meng tackles the question: When does a small random survey really give better estimates than a large non-random dataset? He frames the problem gracefully in a way which highlights the importance of two things: (1) the correlation between *whether* a person responds and *what* their response is, and (2) the overall population size N (not just the observed sample size n). Under simple random sampling, the correlation should be essentially 0, and the usual standard error estimates are appropriate. But when we have non-random sample selection, this correlation leads to bias in our estimates, in a way such that the bias gets *substantially worse* when the population is large and we have a large sampling fraction n/N.

Meng reframes the problem several ways, but my favorite is in terms of “effective sample size” or n_eff. For a particular n, N, and correlation, quantify the uncertainty in our estimate—then figure out what sample size from a simple random sample would give the same level of uncertainty. In an extended example around the 2016 US presidential election, Meng suggests that there was a small but non-zero correlation (around 0.005) between *whether* you responded to polling surveys and *which* candidate you supported, Trump or Clinton. This tiny correlation was enough to affect the polling results substantially. All the late election poll samples added up to about 2 million respondents, which at first seems like a massive sample size—naive confidence intervals would have tiny margins of error, less than 1/10th of a percentage point. Yet due to that small correlation of 0.005 (Trump voters were a tiny bit less likely to respond to polls than Clinton voters were), the effective sample size n_eff was around 400 instead, with a margin of error around 5 percentage points.

Let’s say that again: Final estimates for the percent of popular vote going to Trump vs Clinton were based on 2 million respondents. Yet due to self-selection bias (who agreed to reply to the polls), the estimates’ quality was no better than a simple random survey of only 400 people, if you could get a 100% response rate from those 400 folks.

On the other hand, if you want to try correcting your own dataset’s estimates and margins of error, estimating this correlation is difficult in practice. You certainly can’t do it from the sample at hand, though you can try to approximate it from several samples. Still, one rough estimate of n_eff gives some insight into how bad the problem can get.

Here at Colby College, students have a choice of three dining halls: Foss, Roberts (affectionately called Bobs), and Dana. Around the start of the year, I saw this informal “survey” near the entrance to Foss:

Students walking past the bulletin board were encouraged to make a tally mark near their favorite season, favorite article of clothing, etc… and their favorite dining hall. Here, Foss is far in the lead, with 32 votes compared to 5 and 6 for Dana and Roberts respectively. But of course, since this bulletin board is at Foss, you only get passers-by who are willing to go to Foss. We talked about this in class as an example of (highly!) biased sampling.

Now, I want to go further, using one of Meng’s effective sample size estimates. Let’s say that we want to estimate the proportion of Colby students whose favorite dining hall is Foss. Meng’s equation (4.4) estimates the effective sample size as

n_eff = 4 * (n/N)^2 * (1/delta)^2

where delta is the expected difference in [probability of responding for people who prefer Foss] versus [probability of responding for people who prefer another dining hall]. The bulletin board showed about n = 43 respondents, which is roughly 1/40th of the 2000-ish student body, so let’s assume n/N = 1/40. In that case, if delta is any larger than 0.05, n_eff is estimated to be 1.

It certainly seems reasonable to believe that Prob(respond to a survey in Foss | you prefer Foss) is at least 5 percentage points higher than Prob(respond to a survey in Foss | you prefer some other dining hall). Foss is near the edge of campus, close to dorms but not classrooms. If you *don’t* prefer the food at Foss, there’s little reason to pass by the bulletin board, much less to leave a tally mark. In other words, delta = 0.05 seems reasonable… which implies that n_eff <= 1 is reasonable… which means this “survey” is about as good as asking ONE random respondent about their favorite dining hall!

[Edit: for a sanity check on delta’s order of magnitude, imagine that in fact about 1/3 of Colby’s students prefer each of the 3 dining halls. So we got around 30/600ish Foss-preferrers responding here, for around a 0.05 response rate. And we got around 12/1200ish of the other students responding here, for around a 0.01 response rate. So the difference in response rates is 0.04. That’s a bit lower than our extreme hypothesized delta = 0.05, but it is in a reasonable ballpark.]

In general, if delta is around twice n/N, we estimate n_eff = 1. Even if we waited to get a much larger sample, say 200 or 250 students leaving tally marks on the board, the survey is still useless if we believe delta >= 0.01 is plausible (which I do).

There’s plenty more of interest in Meng’s article, but this one takeaway is already enough to be fascinating for me—and a nifty in-class example for just *how* bad convenience sampling can be.

Tenure-track faculty position in Statistics at Colby College

It’s been a very busy year since my last post! Job search, new baby, new course to teach, thesis defense, moving… and finally—starting a new position as Assistant Professor, in the Department of Mathematics and Statistics at Colby College. I am delighted and honored to be here. More on all that later, but for now, just a quick announcement:

We are hiring another tenure track faculty member in Statistics here at Colby:
https://www.mathjobs.org/jobs/jobs/12378
Please share the posting with anyone who may be interested. This would be the fourth statistician in our Department of Mathematics and Statistics. I am excited that we are expanding our statistics offerings, including our collaboration with the Computer Science department on a minor in Data Science.

Colby College is one of the top small liberal arts colleges in New England. I have found it to be a very collegial, supportive environment, with wonderful students and top-notch colleagues. While Colby emphasizes excellence in teaching, I am already pleased to see that there’s also ample time and resources for research and scholarship as well as professional development.

There is a lot of great energy around Colby’s growing civic engagement with our town of Waterville, as well as increasing size and diversity of both the student body and the faculty. We’re about an hour’s drive from Portland, Maine and about three hours from Boston. The Waterville area offers plenty of local nature and outdoors activities (including a beautiful set of city-maintained cross-country skiing trails).

Full consideration will be given to applications received by October 9, but review will continue until the position is filled. If you have concerns about making this deadline, or if you have any other questions, please do get in touch.

Finally, if you know of interested colleagues on the CS side, our Computer Science department is also hiring for an open-rank tenure track position:
http://www.cs.colby.edu/doc/ColbyCS-2019-TenureTrack.pdf

Continue reading “Tenure-track faculty position in Statistics at Colby College” →

Think-aloud interviews can help you write better assessments

I’m delighted to share with you a pilot study that we ran at CMU this fall, through our Teaching Statistics group. Long story short: It’s hard to write good conceptual-level questions to test student understanding, but think-aloud interviews are a very promising tool. By asking real students to talk out loud as they solve problems, you get insight into whether students give the right/wrong answers because they really do/don’t understand the problem—or whether the question should be better written. If students answer wrong because the question is ambiguous, or if they get it right using generic test-taking strategies rather than knowledge from the course, think-alouds give you a chance to detect this and revise your test questions.

Some context:
CMU’s stats department—renamed the Department of Statistics & Data Science just this fall—is revising the traditional Introductory Statistics classes we offer. Of course, as good statisticians, we’d like to gather evidence and measure whether students are learning any better in the new curriculum. We found several pre-made standardized tests of student learning in college-level Intro Stats, but none of them quite fit what we wanted to measure: have students learned the core concepts, even if they haven’t memorized traditional formulas and jargon?

We tried writing a few such multiple-choice questions ourselves, but it was quite a challenge to see past our own expert blind spots. So, we decided to get hands-on practice with assessment in the Fall 2017 offering of our graduate course on Teaching Statistics. We read Adams and Wieman (2011), “Development and Validation of Instruments to Measure Learning of Expert-Like Thinking”—who recommended using think-aloud interviews as a core part of the test-question design and validation process. This method isn’t commonly known in Statistics, although I have related experience from a decade ago when I studied design at Olin College and then worked in consumer insights research for Ziba Design. It’s been such a delight to revisit those skills and mindsets in a new context here.

We decided to run a pilot study where everyone could get practice running think-aloud interviews. With a handful of anonymous student volunteers, we ran through the process: welcome the volunteer, describe the process, give them some warm-up questions to practice thinking aloud as you solve problems, then run through a handful of “real” Intro Stats test questions and see how they tackle them. During the first pass, the interviewer should stay silent, apart from reminders like “Please remember to think out loud” if the student stops speaking. It’s not perfect, but it gets us closer to how students would really approach this question on an in-class test (not at office hours or in a study session). At the end, we would do a second pass to follow up on anything interesting or unclear, though it’s still best to let them do most of the talking: interviewers might say “I see you answered B here. Can you explain in more detail?” rather than “This is wrong; it should be C because…”

After this pilot, we feel quite confident that a formal think-aloud study will help us write questions that really measure the concepts and (mis)understandings we want to detect. The think-aloud script was drafted based on materials from Chris Nodder’s Lynda.com course and advice from Bowen (1994), “Think-aloud methods in chemistry education”. But there are quite a few open questions remaining about how best to implement the study. We list these on the poster above, which we presented last week at CMU’s Teaching & Learning Summit.

The current plan is to revise our protocol for the rest of Fall 2017 and design a complete formal study. Next, we will run think-alouds and revise individual questions throughout Spring 2018, then pilot and validate at the test level (which set of questions works well as a whole?) in Fall 2018, with pre- and post-tests across several sections and variations of Intro Stats.

PS — I mean no disrespect towards existing Intro Stats assessments such as CAOS, ARTIST, GOALS, SCI, or LOCUS. These have all been reviewed thoroughly by expert statisticians and educators. However, in the documentation for how these tests were developed, I cannot find any mention of think-alouds or similar testing with real students. Student testing seems limited to psychometric validation (for reliability etc.) after all the questions were written. I think there is considerable value in testing question-prototypes with students early in the development process.

PPS — Apologies for the long lack of updates. It’s been a busy year of research and paper-writing, with a particularly busy fall of job applications and interviews. But I’ll have a few more projects ready for sharing here over the next month or two.

ASA watch on integrity of federal statistical data

If you follow the topics I blog about, then you may also wish to read the American Statistical Association’s recent post about monitoring possible threats to the US federal statistical system.

In short, there are concerns that the new administration may defund or eliminate valuable statistical programs. They may also insist on asking for citizenship / immigration status on the next decennial Census, despite the detrimental effect this would have on response rates and data quality. The ASA’s post has further details and links to relevant news stories.

Whatever your political views, it’s difficult to manage a country effectively and efficiently without high-quality statistical information. Public data is an important good for members of the public as well, as I argued a few years back during worries about eliminating a federal statistical survey. I’m grateful those concerns did not come to pass then. I hope for the best this time too.