After 8th semester of statistics PhD program

I realize this is over 2 years late, but I found these drafts sitting around and perhaps they are still worth posting.

Apologies for the negativity! I have to admit this was one of the toughest semesters for me, psychologically. I thought about toning it down, especially since I’m delighted to be where I am now—tenure track faculty at a small liberal arts college—but I don’t want to pretend it’s been easy to get here.

I believe this was the semester I was going to CMU’s Counseling and Psychological Services (CaPS). If you’re a grad student, I recommend that you get to know such resources on your campus. Just about everyone faces the all-pervasive impostor syndrome, which compounds any other specific challenges you might have personally, and it’s helpful to be able to talk through it all with someone experienced.

Previous posts: the 1st, 2nd, 3rd, 4th, 5th, 6th, and 7th semesters of my Statistics PhD program.

Life

The more my son grows up, the more the PhD starts to feel like an overwhelming distraction from real life. One day I got home late and my wife told me what he’d been singing:

“Old MacDonald had a daddy, E-I-E-I-O, with a work-work here and a work-work there…” 🙁

If you have a family while in grad school, you will often feel that either the PhD doesn’t come first, or your family doesn’t come first. The academic folks around you will lean towards the PhD coming first, which of course partly makes sense—if you’re not making the fullest use out of your limited time in grad school, why bother going at all?—but it’s also hugely selection bias. Many (though not all) of the faculty who land and stay at a top research university are people who did decide that work comes first.

Thankfully, nobody has ever sneered to my face that “Ah well, not everyone’s cut out for academia” (though I’ve heard of it happening in other departments). But I feel it in my head all the time. (And I overhear younger students dismissing 30+ year olds like myself as too old to do good research… And I see the offices full every evening and weekend…) At another stage in life, my reaction might have been, “Oh yeah? I’ll show you, I’ll work my butt off and prove I’m good enough to cut it!” But now, my reaction is, “I’ve got better things to do than play this petty game.” Yes, I’ll plod along and finish the PhD I’ve started—perseverance counts for a lot—but right now I am not eager to stretch out this stage of life any longer than necessary.

Research

  • Reading Thinking, Fast and Slow, I am noting the constant mentions of the author’s collaboration with Amos Tversky. I think I’ve been way too focused on self-sufficiency here. It’s true that to get the thesis, I need to demonstrate I can do the work on my own… BUT in order to actually succeed afterwards (whether in academia or not), I’ll need to collaborate with others, not just myself and advisor. Plus, it’s simply more fun, spending those hours tackling a tough problem with an equally interested collaborator! So, my plan starting this summer and into next year: Add a few collaborative projects—finish FPS with Daren, finish DL with Nick and Jordan, revisit CIs/AIPE with Alex, start something (visual inference? CV with confidence?) with Justin… [Looking back, I regret that I did not follow up and make most of these planned collaborations happen while I was still a student!]
  • Jing still amazes me with his quick insight and intuition about how to tackle a proof. When I get stuck after spending hours on something, it takes him almost no time to see: if we back up a few steps to this other point, and tackle that instead, it’ll be much cleaner. This trait is taking me a long time to learn.
  • Daren argues that technical math prowess is definitely not sufficient (good ideas of *what* to study are more important), but also not necessary (even theory-wizards like Yenchi and Ale have to constantly look up things they’ve forgotten). I disagree a bit: I really think fluency in the big-picture math concepts is important—if I have memorized the definition of an eigenvalue, but not internalized its *meaning*, then I will never see big-picture ideas quickly enough, nor know how to start proving technical details, nor recall where to find those details in work I’ve seen before. It’s like when I play clarinet: I don’t need to memorize the piece I’m playing—sheet music is fine—but I *do* need to memorize the basics. I simply cannot play in real-time if I have to refer back to a fingering chart for every single note! In Joel Spolsky’s words (although see Lang’s actual article too):

    Serge Lang, a math professor at Yale, used to give his Calculus students a fairly simple algebra problem on the first day of classes, one which almost everyone could solve, but some of them solved it as quickly as they could write while others took a while, and Professor Lang claimed that all of the students who solved the problem as quickly as they could write would get an A in the Calculus course, and all the others wouldn’t. The speed with which they solved a simple algebra problem was as good a predictor of the final grade in Calculus as a whole semester of homework, tests, midterms, and a final.

    You see, if you can’t whiz through the easy stuff at 100 m.p.h., you’re never gonna get the advanced stuff.

  • It’s also hitting me how stupidly selfish I’ve been here. As much as I’d like to think so, I didn’t come here to solve practical problems and make the world a better place. If I had, I’d have started right off the bat, using (and honing) skills I *do* have, working in the Census research group to make serious progress on applied problems. Instead, I wanted to bone up on my math-theory side, just thinking about the glory of proving theorems, but without putting in the prep work. It’s true that I’ve learned a lot by working on theory problems with Jing—but I would have been so much more productive if I’d taken a few hardcore math classes first, brushing up on my weak spots *before* starting such research. (I took linear algebra over a decade before starting the PhD, and it really shows. Yes, I can multiply matrices, but the advanced stuff has been a real slog.) I’ve spent a couple of years now on work that other students could have done much faster and enjoyed more, while neglecting to make real contributions using the skills I *do* have. In other words, I wish I could go back and tell myself: *either* just take some math classes (or even get a MS in Math?), until you can do the theory on your own (or with a mentoring colleague at work), and skip the Stats PhD… *or* do the PhD in a way that builds on your strengths (and makes real contributions!), not merely papers over your weaknesses. Sadly, I probably wouldn’t have believed me. My teaching experiences and the Eberly center seminars have been wonderful, but otherwise, right now I feel I have not made good use out of my time here. (Even in my neuroscience ADA project flop, only a few logistical challenges were out of my hands—and I could have overcome most of them by gritting my teeth and learning Python well, and by sitting alongside the scientists in the lab.) Hindsight is 20/20, and everyone goes through impostor syndrome, but still…

Teaching

I was a TA for Ann Lee’s section of 36-402: Undergraduate Advanced Data Analysis, using materials developed by Cosma Shalizi.

  • The course largely followed Cosma’s (draft) textbook Advanced Data Analysis from an Elementary Point of View. It was good for me to be “forced” to read up a little on causal inference and related topics. I’m still no expert, but at least not clueless. I also liked his perspective of statistical modeling as “data compression,” and his view of regression as a linear smoother with *weird* weights.
  • Some students mentioned that having to code up cross-validation from scratch 5 times or more was a *good* part of the class. They really feel they understand it now, more so than other things which they never or rarely had to code directly—such as backfitting in GAMs. I worried that repeatedly writing CV from scratch would start to feel like busywork, but luckily not (at least for these few students). And I felt the same about taking the Convex Optimization class myself: it’s great to have repeated practice *coding up the algorithms directly* and understanding what they’re trying to do, even if it’s only practice and in reality you’d actually use a pre-canned routine that deals with subtleties such as numerical convergence issues. So, in future years, we should give more opportunities to practice coding up algorithms, not just deriving theory about them and using them for data analysis. (Not to mention the omitted issues of data collection and power calculations…)
  • By the end of the semester, so many students still didn’t understand the idea of additive vs interaction models. They assumed “additive model” specifically means “GAM with spline terms” and “interaction model” means “linear regression with interaction terms.” We should hit these points harder earlier: “additive” means *any* model that is additive in the terms; and you can certainly do interactions within a GAM by having a multiple-predictor spline term; and so on.
  • If I’m going to be strict about not accepting late HWs, I should do so from the very beginning. It’ll (hopefully) save me ages of back-and-forth emails from students with excuses over the course of the semester. Also, if the promise of no-credit-for-late-HWs only kicks in at the end of semester, all of a sudden, then some students may have already used up their free dropped-HW opportunities, so they get a much lower grade than expected even if they *do* the work (but just submit it late). That’s not *technically* unfair (the syllabus did say we’d reject late HWs)… but it *feels* unfair. Best to set up consistent and clear expectations, right?
  • Likewise, if going to be serious about saying that “on time” means “at the start of class,” then have a TA pick up the HWs right at that time. We saw a trickle of late students (or all showing up at end of class) dumping in HWs after the fact. (Maybe electronic submission, with the deadline enforced by your course-management software, is not so bad.)
  • I’m pleased that we had decent turnaround time for grading most weeks—but it was sad that so many students never bothered to pick up graded HWs. We need better incentives to figure out your mistakes and learn from them, not merely be graded on them. (Alternately, it’s tempting to say that if you *don’t* pick up X of your early HWs, then you accept “stochastic grading” for the rest—we’ll give you a random grade and save time by not grading them manually!)
  • The Blackboard discussion forums were painful to set up and navigate. We should have used Piazza instead.
  • How would I grade such a class with specs-based grading? There are so many details to demonstrate understanding of, and so many ways to lose points on current assignments. How to get around point-grubbing here?

Other projects

  • I made no progress on the FPS paper with Daren, nor on the DL paper with Nick 🙁 At least the FPS paper was submitted to a conference… and rejected by reviewers who didn’t understand the purpose of the paper. I should have quickly revised the introduction to reframe our goals clearly and sent it somewhere else, but instead it’s been sitting on my desk.
  • This semester (or maybe near the end of last term?) I volunteered to join the GenEd committee. This is a mostly-faculty committee, revising the general education requirements in the CMU college (Dietrich College of Humanities and Social Sciences) to which our Statistics department belongs. It’s been eye-opening to see how faculty meetings go behind the scenes. (In particular, it’s fascinating that even top humanities scholars at a top department have trouble concisely defending the humanities as a GenEd requirement. There are also long digressions to quibble over a single word—“competence” is like a loaded gun, and even clearly-temporary placeholders like “Off-campus experiences” hold up the meeting interminably once someone points out that undergrad research also falls under that heading and it must be renamed right now…) But seriously, it’s been great to hear some truly remarkable educators discuss the direction of our programs, from broad goals to particular pedagogical methods. As a statistician, naturally I volunteered for the Assessment subgroup, and it has been so wonderful to work with experts like Marsha Lovett.
  • I did well in the “Three Minute Thesis” (3MT) competition! Grad students have 3 minutes and 1 slide to summarize their work for a wide audience. I was surprised and thrilled to win my preliminary round. Unfortunately the finals were the same date as an out-of-town trip I could not miss, so I did not get to compete further, but they still kindly gave me a prelim-round prize of research funds which I used for travel to USCOTS.
  • I presented my dataviz course poster at the US Conference on Teaching Statistics (USCOTS), in State College, PA. During the poster session I was pleased to meet several folks from stats departments looking to hire new faculty, and they seemed pleased to meet a grad student close to defending who is interested in both pedagogy and research. I believe this was my first contact with Chris Malone at Winona State University and KB Boomer at Bucknell University.
  • The CMU student-run Data Science Club also invited me to give a high-level talk on principles of good data visualization. Here are my slides, a checklist of best practices to follow, R code, and the nhanes and obesity_education CSV files.
  • Alex Reinhart and I proposed and ran a mini (half-semester course) on Teaching Statistics, with Rebecca Nugent advising. It was run mostly as a journal club—a good chance to read and discuss interesting papers on pedagogy in general as well as (introductory) statistics education in particular.
  • Finally, I had my first R package RankingProject accepted on CRAN! Together with Tommy Wright and Martin Klein, my former colleagues back at the Census Bureau, I have been working on a paper about ways to visualize data that lead to correct inferences about comparing many different estimates. This package contains the code and data to implement techniques we recommend in the paper, “A Primer on Visualizations for Comparing Populations, Including the Issue of Overlapping Confidence Intervals” [which went to print in May 2019 in The American Statistician]. The package also was listed as one of RStudio’s top 40 new packages in March 2017, and it has had a respectable number of downloads so far: total Total CRAN downloads for RankingProject package, with a monthly average of Monthly CRAN downloads for RankingProject package.

Next up

The 9th and 10th semesters of my Statistics PhD program.