After 6th semester of statistics PhD program

Posting far too late again, but here’s what I remember from last Spring…

This was my first semester with no teaching, TAing, or classes (besides one I audited for fun). As much as I enjoy these things, research has finally gone much faster and smoother with no other school obligations. The fact that our baby started daycare also helped, although it’s a bittersweet transition. At the end of the summer I passed my proposal presentation, which means I am now ABD!

Previous posts: the 1st, 2nd, 3rd, 4th, and 5th semesters of my Statistics PhD program.

Thesis research and proposal

During 2015, most of my research with my advisor, Jing Lei, was a slow churn through understanding and extending his sparse PCA work with Vince Vu. At the end of the year I hadn’t gotten far and we decided to switch to a new project… which eventually became my proposal, in a roundabout way.

We’d heard about the concept of submodularity, which seems better known in CS, and wondered where it could be useful in Statistics as well. Das & Kempe (2011) used submodularity to understand when greedy variable selection algorithms like Forward Selection (FS, aka Forward Stepwise regression) can’t do too much worse than Best Subsets regression. We thought this approach might give a new proof of model-selection consistency for FS. It turned out that submodularity didn’t give us a fruitful proof approach after all… but also that (high-dimensional) conditions for model-selection consistency of FS hadn’t been derived yet. Hence, this became our goal: Find sufficient conditions for FS to choose the “right” linear regression model (when such a thing exists), with probability going to 1 as the numbers of observations and variables go to infinity. Then, compare these conditions to those known for other methods, such as Orthogonal Matching Pursuit (OMP) or the Lasso. Finally, analyze data-driven stopping rules for FS—so far we have focused on variants of cross-validation (CV), which is surprisingly not as well-understood as I thought.

One thing I hadn’t realized before: when writing the actual proposal, the intent is to demonstrate your abilities and preparedness for research, not necessarily to plan out your next research questions. As it turns out, it’s more important to prove that you can ask interesting questions and follow through on them. Proposing concrete “future work” is less critical, since we all know it’ll likely change by the time you finish the current task. Also, the process of rewriting everything for the paper and talk was a helpful process itself in getting me to see the “big picture” ideas in my proofs.

Anyhow, it did feel pretty great to actually complete a proof or two for the proposal. Even if the core ideas really came from my advisor or other papers I’ve read, I did do real work to pull it all together and prepare the paper & talk.

Many thanks to everyone who attended my proposal talk. I appreciated the helpful questions and discussion; it didn’t feel like a grilling for its own sake (as every grad student fears). Now it’s time to follow through, complete the research, practice the paper-submission process, and write a thesis!

The research process

When we shifted gears to something my advisor does not already know much about, it helped me feel much more in charge and productive. Of course, he caught up and passed me quickly, but that’s only to be expected of someone who also just won a prestigious NSF CAREER award.

Other things that have helped: Getting the baby into day care. No TAing duties to divide my energy this semester. Writing up the week’s research notes for my advisor before each meeting, so that (1) the meetings are more focused & productive and (2) I build up a record of notes that we can copy-paste into papers later. Reading Cal Newport’s Deep Work book and following common-sense suggestions about keeping a better schedule and tracking better metrics. (I used to tally all my daily/weekly stats-related work hours; now I just tally thesis hours and try to hit a good target each week on those alone, undiluted by side stuff.)

I’m no smarter, but my work is much more productive, I feel much better, and I’m learning much more. Every month I look back and realize that, just a month ago, I’d have been unable to understand the work I’m doing today. So it is possible to learn and progress quite quickly, which makes me feel much better about this whole theory-research world. I just need to immerse myself, spend enough time, revisit it regularly enough, have a concrete research question that I’m asking—and then I’ll learn it and retain it far better than I did the HWs from classes I took.

Indeed, a friend asked what I’d do differently if I were starting the PhD again. I’d spend far less energy on classes, especially on homework. It feels good and productive to do HW, and being good at HW is how I got here… but it’s not really the same as making research progress. Besides, as interesting and valuable as the coursework has been, very little of it has been directly relevant to my thesis (and the few parts that were, I’ve had to relearn anyway). So I’d aim explicitly for “B equals PhD” and instead spend more time doing real research projects, wrapping them up into publications (at least conference papers). As it is, I have a pile of half-arsed never-finished class / side projects, which could instead be nice CV entries if I’d polished them instead of spending hours trying to get from a B to an A.

My advisor also pointed out that he didn’t pick up his immense store of knowledge in a class, but by reading many many papers and talking with senior colleagues. I’ve also noticed a pattern from reading a ton of papers on each of several specialized sub-topics. First new paper I encounter in an area: whoa, how’d they come up with this from scratch, what does it all mean? Next 2-3 papers: whoa, these all look so different, how will I ever follow the big ideas? Another 10-15 papers: aha, they’re actually rehashing similar ideas and reusing similar proof techniques with small variations, and I can do this too. Reassuring, but it does all take time to digest.

All that said, I still feel like a slowly-plodding turtle compared to the superstar researchers here at CMU. Sometimes it helps to follow Wondermark’s advice on how he avoided discouragement in webcomics: ignore the more-successful people already out there and make one thing at a time, for a long time, until you’ve made many things and some are even good.

(Two years in!) I had just learned the word “webcomics” from a panel at Comic-Con. I was just starting to meet other people who were doing what I was doing.

Let me repeat that: Two years and over a hundred strips in is when I learned there was a word for what I was doing.

I had a precious, lucky gift that I don’t know actually exists anymore: a lack of expectations for my own success. I didn’t know any (or very few) comic creators personally; I didn’t know their audience metrics or see how many Twitter followers they had or how much they made on Patreon. My comics weren’t being liked or retweeted (or not liked, or not retweeted) within minutes of being posted.

I had been able to just sit down and write a bunch of comics without anyone really paying attention, and I didn’t have much of a sense of impatience about it. That was a precious gift that allowed me to start finding my footing as a creator by the time anyone did notice me – when people did start to come to my site, there was already a lot of content there and some of it was pretty decent.

Such blissful ignorance is hard to achieve in a department full of high-achievers. I’ve found that stressing about the competition doesn’t help me work harder or faster. But when I cultivate patience, at least I’m able to continue (at my own pace) instead of stopping entirely.

[Update:] another take on this issue, from Jeff Leek:

Don’t compare myself to other scientists. It is very hard to get good evaluation in science and I’m extra bad at self-evaluation. Scientists are good in many different dimensions and so whenever I pick a one dimensional summary and compare myself to others there are always people who are “better” than me. I find I’m happier when I set internal, short term goals for myself and only compare myself to them.

Classes

I audited Christopher Phillips’ course Moneyball Nation. This was a gen-ed course in the best possible sense, getting students to think both like historians and like statisticians. We explored how statistical/quantitative thinking entered three broad fields: medicine, law, and sports.

Reading articles by doctors and clinical researchers, I got a context for how statistical evidence fits in with other kinds of evidence. Doctors (and patients!) find it much more satisfying to get a chemist’s explanation of how a drug “really works,” vs. a statistician’s indirect analysis showing that a drug outperforms placebo on average. Another paper confirmed for me that (traditional) Statistics’ biggest impact on science was better experimental design, not better data analysis. Most researchers don’t need to collaborate with a statistical theoretician to derive new estimators; they need an applied statistician who’ll ensure that their expensive experimental costs are well spent, avoiding confounding and low power and all the other issues.

[Update:] I’ve added a whole post on these medical articles.

In the law module, we saw how difficult it is to use statistical evidence appropriately in trials, and how lawyers don’t always find it to be useful. Of course we want our trial system to get the right answers as often as possible (free the innocent and catch the guilty), so from a purely stats view it’s a decision-theory question: what legal procedures will optimize your sensitivity and specificity? But the courts, especially trial by jury, also serve a distinct social purpose: ensuring that the legal decision reflects and represents community agreement, not just isolated experts who can’t be held accountable. When you admit complicated statistical arguments that juries cannot understand, the legal process becomes hard to distinguish from quack experts bamboozling the public, which undermines trust in the whole system. That is, you have the right to a fair trial by a jury of your peers; and you can’t trample on that right in order to “objectively” make fewer mistakes. (Of course, this is also an argument for better statistical education for the public, so that statistical evidence becomes less abstruse.)

[Update:] In a bit more detail, “juries should convict only when guilt is beyond reasonable doubt. …one function of the presumption of innocence is to encourage the community to treat a defendant’s acquittal as banishing all lingering suspicion that he might have been guilty.” So reasonable doubt is meant to be a fuzzy social construct that depends on your local community. If trials devolve into computing a fungible “probability of guilt,” you lose that specificity / dependence on local community, and no numerical threshold can truly serve this purpose of being “beyond a reasonable doubt.” For more details on this ritual/pageant view of jury trials, along with many other arguments against statistics in the courtroom, see (very long but worthwhile) Tribe (1971), “Trial by Mathematics: Precision and Ritual in the Legal Process” [journal, pdf].

[Note to self: link to some of the readings described above.]

Next time I teach I’ll also use Prof. Phillips’ trick for getting to know students: require everyone to sign up for a time slot to meet in his office, in small groups (2-4 people). This helps put names to faces and discover students’ interests.

Other projects

I almost had a Tweet cited in a paper 😛 Rob Kass sent along to the department an early draft of “Ten Simple Rules for Effective Statistical Practice” which cited one of my tweets. Alas, the tweet didn’t make it into the final version, but the paper is still worth a read.

I also attended the Tapestry conference in Colorado, presenting course materials from the Fall 2015 dataviz class that I taught. See my conference notes here and here.

Even beyond that, it’s been a semester full of thought on statistical education, starting with a special issue in The American Statistician (plus supplementary material). I also attended a few faculty meetings in our college of humanities and social sciences, to which our statistics department belongs. They are considering future curricular revisions to the general-education requirements. What should it mean to have a well-rounded education, in general and specifically at this college? These chats also touch on the role of our introductory statistics course: where should statistical thinking and statistical evidence fit into the training of humanities students? This summer we started an Intro Stats working group for revising our department’s first course; I hope to have more to report there eventually.

Finally, I TA’ed for our department’s summer undergraduate research experience program. More on that in a separate post.

Life

My son is coordinated enough to play with a shape-sorter, which is funny to watch. He gets so frustrated that the square peg won’t go in the triangular hole, and so incredibly pleased when I gently nudge him to try a different hole and it works. (Then I go to meet my advisor and it feels like the same scene, with me in my son’s role…)

He’s had many firsts this spring: start of day care, first road trip, first time attending a wedding, first ER visit… Scary, joyful, bittersweet, all mixed up. It’s also becoming easier to communicate, as he can understand us and express himself better; but he also now has preferences and insists on them, which is a new challenge!

I’ve also joined some classmates in a new book club. A few picks have become new favorites; others really put me outside my comfort zone in reading things I’d never choose otherwise.

Next up

The 7th, 8th, 9th, and 10th semesters of my Statistics PhD program.

After 5th semester of statistics PhD program

Better late than never—here are my hazy memories of last semester. It was one of the tougher ones: an intense teaching experience, attempts to ratchet up research, and parenting a baby that’s still too young to entertain itself but old enough to get into trouble.

Previous posts: the 1st, 2nd, 3rd, and 4th semesters of my Statistics PhD program.

Classes

I’m past all the required coursework, so I only audited Topics in High Dimensional Statistics, taught by Alessandro Rinaldo as a pair of half-semester courses (36-788 and 36-789). “High-dimensional” here loosely means problems where you have more variables (p) than observations (n). For instance, in genetic or neuroscience datasets, you might have thousands of measurements each from only tens of patients. The theory here is different than in traditional statistics because you usually assume that p grows with n, so that getting more observations won’t reduce the problem to a traditional one.

This course focused on some of the theoretical tools (like concentration inequalities) and results (like minimax bounds) that are especially useful for studying properties of high-dimensional methods. Ale did a great job covering useful techniques and connecting the material from lecture to lecture.

In the final part of the course, students presented recent minimax-theory papers. It was useful to see my fellow students work through how these techniques are used in practice, as well as to get practice giving “chalk talks” without projected slides. I gave a talk too, preparing jointly with my classmate Lingxue Zhu (who is very knowledgeable, sharp, and always great to work with!) Ale’s feedback on my talk was that it was “very linear”—I hope that was a good thing? Easy to follow?

Also, as in every other stats class I’ve had here, we brought up the curse of dimensionality—meaning that, in high-dimensional data, very few points are likely to be near the joint mean. I saw a great practical example of this in a story about the US Air Force’s troubles designing fighter planes for the “average” pilot.

Teaching

I taught a data visualization course! Check out my course materials here. There’ll be a separate post reflecting on the whole experience. But the summer before, it was fun (and helpful) to binge-read all those dataviz books I’ve always meant to read.

I’ve been able to repurpose my lecture materials for a few short talks too. I was invited to present a one-lecture intro to data viz for Seth Wiener‘s linguistics students here at CMU, as well as for a seminar on Data Dashboard Design run by Matthew Ritter at my alma mater (Olin College). I also gave an intro to the Grammar of Graphics (the broader concept behind ggplot2) for our Pittsburgh useR Group.

Research

I’m officially working with Jing Lei, still looking at sparse PCA but also some other possible thesis topics. Jing is a great instructor, researcher, and collaborator working on many fascinating problems. (I also appreciate that he, too, has a young child and is understanding about the challenges of parenting.)

But I’m afraid I made very slow research progress this fall. A lot of my time went towards teaching the dataviz course, and plenty went to parenthood (see below), both of which will be reduced in the spring semester. I also wish I had some grad-student collaborators. I’m not part of a larger research group right now, so meetings are just between my advisor and me. Meetings with Jing are very productive, but in between it’d also be nice to hash out tough ideas together with a fellow student, without taking up an advisor’s time or stumbling around on my own.

Though it’s not quite the same, I started attending the Statistical Machine Learning Reading Group regularly. Following these talks is another good way to stretch my math muscles and keep up with recent literature.

Life

As a nice break from statistics, we got to see our friends Bryan Wright and Yuko Eguchi both defend their PhD dissertations in musicology. A defense in the humanities seems to be much more of a conversation involving the whole committee, vs. the lecture given by Statistics folks defending PhDs.

Besides home and school, I’ve been a well-intentioned but ineffective volunteer, trying to manage a few pro bono statistical projects. It turns out that virtual collaboration, managing a far-flung team of people who’ve never met face-to-face, is a serious challenge. I’ve tried reading up on advice but haven’t found any great tips—so please leave a comment if you know any good resources.

So far, I’ve learned that choosing the right volunteer team is important. Apparent enthusiasm (I’m eager to have a new project! or even eager for this particular project!) doesn’t seem to predict commitment or followup as well as apparent professionalism (whether or not I’m eager, I will stay organized and get s**t done).

Meanwhile, the baby is no longer in the “potted-plant stage” (when you can put him down and expect he’ll still be there a second later), but not yet in day care, while my wife is returning to part-time work. After this semester, we finally got off the wait-lists and into day care, but meanwhile it was much harder to juggle home and school commitments this semester.

However, he’s an amazing little guy, and it’s fun finally taking him to outings and playdates at the park and zoo and museums (where he stares at the floor instead of exhibits… except for the model railroad, which he really loved!) We also finally made it out to Kennywood, a gorgeous local amusement park, for their holiday light show.

Here’s to more exploration of Pittsburgh as the little guy keeps growing!

Next up

The 6th, 7th, 8th, 9th, and 10th semesters of my Statistics PhD program.

After 4th semester of statistics PhD program

This was my first PhD semester without any required courses (more or less). That means I had time to focus on research, right?

It was also my first semester as a dad. Exhilarating, joyful, and exhausting 🙂 So, time was freed up by having less coursework, but it was reallocated largely towards diapering and sleep. Still, I did start on a new research project, about which I’m pretty excited.

Our department was also recognized as one of the nation’s fastest-growing statistics departments. I got to see some of the challenges with this first-hand as a TA for a huge 200-student class.

See also my previous posts on the 1st, the 2nd, and the 3rd semesters of my Statistics PhD program.

Classes:

  • Statistical Computing:
    This was a revamped, semi-required, half-semester course, and we were the guinea pigs. I found it quite useful. The revamp was spearheaded by our department chair Chris Genovese, who wanted to pass on his software engineering knowledge/mindset to the rest of us statisticians. This course was not just “how to use R” (though we did cover some advanced topics from Hadley Wickham’s new books Advanced R and R Packages; and it got me to try writing homework assignment analyses as R package vignettes).
    Rather, it was a mix of pragmatic coding practices (using version control such as Git; writing and running unit tests; etc.) and good-to-know algorithms (hashing; sorting and searching; dynamic programming; etc.). It’s the kind of stuff you’d pick up on the job as a programmer, or in class as a CS student, but not necessarily as a statistician even if you write code often.
    The homework scheme was nice in that we could choose from a large set of assignments. We had to do two per week, but could do them in any order—so you could do several on a hard topic you really wanted to learn, or pick an easy one if you were having a rough week. The only problem is that I never had to practice certain topics if I wanted to avoid them. I’d like to try doing this as an instructor sometime, but I’d want to control my students’ coverage a bit more tightly.
    This fall, Stat Computing becomes an actually-required, full-semester course and will be cotaught by my classmate Alex Reinhart.
  • Convex Optimization:
    Another great course with Ryan Tibshirani. Tons of work, with fairly long homeworks, but I also learned a huge amount of very practical stuff, both theory (how to prove a certain problem is convex? how to prove a certain optimization method works well?) and practice (which methods are likely to work on which problems?).
    My favorite assignments were the ones in which we replicated analyses from recent papers. A great way to practice your coding, improve your optimization, and catch up with the literature all at once. One of these homeworks actually inspired in me a new methodological idea, which I’ve pursued as a research project.
    Ryan’s teaching was great as usual. He’d start each class with a review from last time and how it connects to today. There were also daily online quizzes, posted after class and due at midnight, that asked simple comprehension questions—not difficult and not a huge chunk of your grade, but enough to encourage you to keep up with the class regularly instead of leaving your studying to the last minute.
  • TAing for Intro to Stat Inference:
    This was the 200-student class. I’m really glad statistics is popular enough to draw such crowds, but it’s the first time the department has had so many folks in the course, and we are still working out how to manage it. We had an army of undergrad- and Masters-level graders for the weekly homeworks, but just three of us PhD-level TAs to grade midterms and exams, which made for several loooong weekends.
    I also regret that I often wasn’t at my best during my office hours this semester. I’ll blame it largely on baby-induced sleep deprivation, but I could have spent more time preparing too. I hope the students who came to my sessions still found them helpful.
  • Next semester, I’ll be teaching the grad-level data visualization course! It will be heavily inspired by Alberto Cairo’s book and his MOOC. I’m still trying to find the right balance between the theory I think is important (how does the Grammar of Graphics work, and why does it underpin ggplot2, Tableau, D3, etc.? how does human visual perception work? what makes for a well-designed graphic?) vs. the tool-using practice that would certainly help many students too (teach me D3 and Shiny so I can make something impressive for portfolios and job interviews!)
    I was glad to hear Scott Murray’s reflections on his recent online dataviz course co-taught with Alberto.

Research:

  • Sparse PCA: I’ve been working with Jing Lei on several aspects of sparse PCA, extending some methodology that he’s developed with collaborators including his wife Kehui Chen (also a statistics professor, just down the street at UPitt). It’s a great opportunity to practice what I’ve learned in Convex Optimization and earlier courses. I admired Jing’s teaching when I took his courses last year, and I’m enjoying research work with him: I have plenty of independence, but he is also happy to provide direction and advice when needed.
    We have some nice simulation results illustrating that our method can work in an ideal setting, so now it’s time to start looking at proofs of why it should work 🙂 as well as a real dataset to showcase its use. More on this soon, I hope.
    Unfortunately, one research direction that I thought could become a thesis topic turned out to be a dead end as soon as we formulated the problem more precisely. Too bad, though at least it’s better to find out now than after spending months on it.
  • I still need to finish writing up a few projects from last fall: my ADA report and a Small Area Estimation paper with Rebecca Steorts (now moving from CMU to Duke). I really wish I had pushed myself to finish them before the baby came—now they’ve been on the backburner for months. I hope to wrap them up this summer. Apologies to my collaborators!

Life:

  • Being a sDADistician: Finally, my penchant for terrible puns becomes socially acceptable, maybe even expected—they’re “dad jokes,” after all.
    Grad school seems to be a good time to start a family. (If you don’t believe me, I heard it as well from Rob Tibshirani last semester.) I have a pretty flexible schedule, so I can easily make time to see the baby and help out, working from home or going back and forth, instead of staying all day on campus or at the office until late o’clock after he’s gone to bed. Still, it helps to make a concrete schedule with my wife, about who’s watching the baby when. Before he arrived, I had imagined we could just pop him in the crib to sleep or entertain himself when we needed to work—ah, foolish optimism…
    It certainly doesn’t work for us both to work from home and be half-working, half-watching him. Neither the work nor the child care is particularly good that way. But when we set a schedule, it’s great for organization & motivation—I only have a chunk of X hours now, so let me get this task DONE, not fritter the day away.
    I’ve spent less time this semester attending talks and department events (special apologies to all the students whose defenses I missed!), but I’ve also forced myself to get much better about ignoring distractions like computer games and Facebook, and I spend more of my free time on things that really do make me feel better such as exercise and reading.
  • Stoicism: This semester I decided to really finish the Seneca book I’d started years ago. It is part of a set of philosophy books I received as a gift from my grandparents. Long story short, once I got in the zone I was hooked, and I’ve really enjoyed Seneca’s Letters to Lucilius as well as Practical Philosophy, a Great Courses lecture series on his contemporaries.
    It turns out several of my fellow students (including Lee Richardson) have been reading the Stoics lately too. The name “Stoic” comes from “Stoa,” i.e. porch, after the place where they used to gather… so clearly we need to meet for beers at The Porch by campus to discuss this stuff.
  • Podcasts: This semester I also discovered the joy of listening to good podcasts.
    (1) Planet Money is the perfect length for my walk to/from campus, covers quirky stories loosely related to economics and finance, and includes a great episode with a shoutout to CMU’s Computer Science school.
    (2) Talking Machines is a more academic podcast about Machine Learning. The hosts cover interesting recent ideas and hit a good balance—the material is presented deeply enough to interest me, but not so deeply I can’t follow it while out on a walk. The episodes usually explain a novel paper and link to it online, then answer a listener question, and end with an interview with a ML researcher or practitioner. They cover not only technical details, but other important perspectives as well: how do you write a ML textbook and get it published? how do you organize a conference to encourage women in ML? how do you run a successful research lab? Most of all, I love that they respect statisticians too 🙂 and in fact, when they interview the creator of The Automatic Statistician, they probe him on whether this isn’t just going to make the data-fishing problem worse.
    (3) PolicyViz is a new podcast on data visualization, with somewhat of a focus on data and analyses for the public: government statistics, data journalism, etc. It’s run by Jon Schwabish, whom I (think I) got to meet when I still worked in DC, and whose visualization workshop materials are a great resource.
  • It’s a chore to update R with all the zillion packages I have installed. I found that Tal Galili’s installr manages updates cleanly and helpfully.
  • Next time I bake brownies, I’ll add some spices and call them “Chai squares.” But we must ask, of course: what size to cut them for optimal goodness of fit in the mouth?

Next up: the 5th, 6th, 7th, 8th, 9th, and 10th semesters of my Statistics PhD program.

After 3rd semester of Statistics PhD program

It’s time for another braindump of reflections on statistics grad school.
See also the previous two posts: After 1st semester of Statistics PhD program and  After 2nd semester of Statistics PhD program.

This was my last semester of required coursework. Having passed the Data Analysis Exam in May, and with all the courses under my belt, I am pretty much ready to focus on the thesis topic search and proposal. Exciting!

Classes: Continue reading “After 3rd semester of Statistics PhD program”

After 2nd semester of Statistics PhD program

Here’s another post on life as a statistics PhD student (in the Department of Statistics, at Carnegie Mellon University, in Pittsburgh, PA).
The previous such post was After 1st semester of Statistics PhD program.

Classes:

  • I feared that Advanced Probability Overview would be just dry esoteric theory, but Jing Lei ensured all the topics were really well-motivated. Although it was tough, I did better than I’d hoped (especially given that I’ve never taken a proper Real Analysis course). In Statistical Machine Learning, Larry Wasserman and Ryan Tibshirani did a great job of balancing “old” core theory with new cutting-edge research topics, including helpful homework assignments that gave us practice both in theory and in applications.
  • My highlight of the semester was being able to read and digest a research paper that was way too abstract when I tried reading it a few years ago. It really hit me that I must be learning something in grad school 🙂
    (The paper was Building Consistent Regression Trees from Complex Sample Data, by Toth and Eltinge. While working at Census, I wanted to try running a complex-survey-weighted regression tree, but I couldn’t get much out of this paper. Now, after a good dose of probability theory and machine learning, it’s far clearer. In fact, I have some ideas about extending this work!)
  • The Statistical Machine Learning class referenced a ton of crazy math terms I wasn’t familiar with: Banach and Hilbert spaces, Lp norms, conjugate functions, etc. It terrified me at first—I’ve never even heard of this stuff, should I have taken grad-level functional analysis before I started this PhD, am I about to fail?!?—but it turns out a lot of it is just names for specific versions of general concepts that I already knew. Whew. Also, most of it got used repeatedly from topic to topic, so we did gain familiarity even without explicitly taking a functional analysis course etc. So, don’t get disheartened too easily by unfamiliar terminology!
  • It was great to finally learn more about Lp norms and about splines. Also, almost everything in SML can be written as a penalized regression 😛
  • Smoothing splines and Reproducing Kernel Hilbert Space (RKHS) regression are nifty because the setup is that you want to optimize over all possible functions. So you start out with an infinite-dimensional space, for which in general there might be no simple way to search/optimize! … But in these specific setups, we can prove that the optimal solution happens to lie in a finite-dimensional subspace, where your usual optimization/search tools will work after all. Nice.
  • Larry had a nice “foundations” day in SML, with examples where Bayes and Frequentist analysis differ greatly. However, I didn’t find most of his examples too convincing, since the Bayesian “loses” only due to a stupid choice of priors; or the Bayesian “loses” for finite n but in a case where n in practice would have to be ridiculously large. Still, this helped stretch my thinking about how these inference philosophies differ.
  • Larry points out: you often hear that “We might as well go Bayes because if you give people a Frequentist interval, they’ll interpret it as a Bayes interval.” But the reverse is also true: Give someone a sequence of 95% Bayes intervals, and they’ll expect 95% of them to contain the true value. That is NOT necessarily going to happen with Bayes CIs (unlike Frequentist CIs).
  • In addition to Subjective, Objective, Empirical, or Calibrated Bayes, let me propose “Cynical Bayes”: Don’t choose a prior because you believe it. Instead, choose one to optimize your estimator’s Frequentist properties. That way you can keep your expert Freq’ist colleagues happy, yet still call it a Bayes estimator, so you can give the usual Bayes interpretation to keep nonexperts happy 🙂
  • A background in Statistics will keep you thinking about distributions and probabilities and convergences. But a background in Applied Math may be better at giving you tools and ideas for feature engineering. It’s worth having both toolsets.
  • The Advanced Probability Overview course covered some measure-theoretic probability. I’m finally understanding the subtleties of how the different convergences \xrightarrow{p}, \xrightarrow{as}, \xrightarrow{D}, and \xrightarrow{L^p} all differ, and why it matters. We saw these concepts last semester in Intermediate Statistics, but the distinctions are far clearer to me now.
  • AdvProb’s measure theory section also really helped me understand why textbooks say a random variable is a “function”: intuitively it seems like just a variable or a number or something… but in fact it really is a function, from “the state of the world” i.e. an element \omega of the set \Omega of all possible outcomes or states of the world, to the measurement you will collect (often a number on the real line). Finally, this measure theory view of probability, as the size of a subset of \Omega, is helpful. Even though statisticians’ goal is to develop tools that let them work with the range of the random variable and ignore the domain \Omega, it’s good to remember that this domain exists.
  • However, measure theory and probability theory suffer from some really poor terminology! For example, it took me far too long to realize that “integrable” means “the integral is finite”, NOT “the integral exists.”
  • When we teach students R, we really should use practical examples, not the arbitrary generic examples that you see so often. Instead of just showing me list(1,"a"), it helps to give a realistic example of why you may actually need to collect together numeric and character elements in a single object.

Research:

  • I started a new research project, the Advanced Data Analysis project, which will run until the end of this upcoming Fall semester (so about a year total). I am working with Rob Kass and Avniel Ghuman on using magnetoencephalography (MEG) data to study epilepsy.
  • At Rob’s research group meetings, I learn a ton from the helpful questions he asks. When presenting someone else’s work (i.e. for a journal club), ask yourself, “What would you do if *your* research was based on the data from this paper?” Still, I’ve found I really do need to keep scheduling weekly 1-on-1 meetings—the group meetings are not enough to stay optimally on track.
  • Neuroscience is hard! Pre-processing massive neuroscience datasets using not-fully-documented open source software is particularly hard. When I chose this project, I did not realize how much time I would have to spent on learning the subject matter, relevant specialized software tools, and data pre-processing workflow. Four months in and I’ve still barely gotten to the point of doing any “real” statistics. It’s a good project and I’m learning a lot, but it’s disheartening to see how much of that learning has been tied to debugging open-source software installations that I’ll only ever use again if I stay in this sub-field.
    I would advise the next PhD cohort to choose projects that’ll primarily teach you more general-purpose, transferable skills. Maybe take an existing theoretical method that’s not implemented in software yet, and make it into an R package?

Life:

  • This was a tougher semester in many ways, with harder classes and more research-related setbacks. The Cake song Tougher than it is got a lot of play time on my headphones 😛
  • I’m glad that despite my slow posting rate, the blog still kept getting regular traffic—particularly Is a Master’s degree in Statistics worthwhile? I guess it’s a burning question these days.
  • A big help to my sanity this semester came from joining the All University Orchestra. After a long week of tough classes and research setbacks, it’s great to switch brain modes and play my clarinet. I’ve really missed playing for the past few years in DC, and I’m glad to get back into it.
  • Pittsburgh highlights: Bayernhof museum, Pittsburgh Symphony Orchestra concerts (The Legend of Zelda, “Behind the Notes” talks), Jozsa Corner, Point Brugge Cafe, sampling all the Squirrel Hill pizzerias, MCMC Bar Crawl on the Southside Flats, riding the ridiculously steep inclines, Pittsburgh Area Theater Organ Society concerts and tours of their beautiful theater organ
  • Things still on our list to do in Pittsburgh: see a CMU theater performance, Pittsburgh aviary and zoo, Kennywood amusement park, Steelers game, Penguins game
  • I look forward to getting a chance to teach a whole course this summer. It’ll be 36-309, Experimental Design. I also took some Eberly Center seminars, and the department organized helpful planning meetings for those of us students who’ll teach in the summer, so I feel reasonably prepared.
    I plan to have my students design a series of experiments to bake the ultimate chocolate chip cookie. It will be delicious. I baked Meg Hourihan’s mean chocolate chip cookies for a department event earlier this spring, which seems like an appropriate start.
    However, ironically, as the local knitr / reproducible research fanboy… I’m supposed to teach the course using SPSS, which seems to be largely point-and-click, without much support for reproducible reports 🙁
  • It was a nice difference to be on the other side of the department’s open house for admitted students this year 🙂 I’m also happy to be reading Grad Cafe forums from a much more relaxed point of view this year!
  • I’m surprised there’s not much crossover between the CMU and UPitt statistics departments. And the stats community outside each department doesn’t seem as vibrant as it was in DC. I attended the American Statistical Association’s Pittsburgh chapter banquet. Besides CMU and Pitt folks, most attendees seemed to be RAND employees or independent consultants. There are also some Meetup groups: the Pittsburgh Data Visualization Group and the Pittsburgh useR Group.
  • I’ve updated and expanded my CMU blogroll in the sidebar. Please let me know if I missed your CMU/Pittsburgh statistics-related blog!

Other people’s helpful posts on the PhD experience:

Next up: the 3rd, 4th, 5th, 6th, 7th, 8th, 9th, and 10th semesters of my Statistics PhD program.

After 1st semester of Statistics PhD program

Have you ever wondered whether the first semester of a PhD is really all that busy? My complete lack of posts last fall should prove it 🙂

Some thoughts on the Fall term, now that Spring is well under way [edit: added a few more points]:

  • RMarkdown and knitr are amazing. When I next teach a course using R, my students will be turning in homeworks using these tools: The output immediately shows whether the code runs and what its results are. This is much better than students copying and pasting possibly-broken code and unconnected output into a text file or (gasp) Word document.
  • I’m glad my cohort socializes outside the office, taking each other out for birthday lunches or going to see a Pirates game. Some of the older PhD students are so focused on their thesis work that they don’t take time for a social break, and I’d like to avoid getting stuck in that rut.
    However! Our lunches always lead us back to the age old question: How many statisticians does it take to split a bill? Answer: too long. I threw together a Shiny app, DinneR, to help us answer this question 🙂

DinneR

  • The first-year PhD courses in Statistics and in Machine Learning have rather different approaches.
    • Statistics professor: Just assume we can compute this estimator. In class we’ll prove that the estimates are reasonably good (e.g. we’ll bound the probability that an estimate is far from the true value).
    • Machine Learning professor: Just trust me that this algorithm gets useful estimates. In class we’ll prove that we can compute it in a reasonable amount of time (e.g. we’ll bound the number of steps until the algorithm converges).
    • Somewhere between these ideas, I ran into the sensible concept of optimizing only until your solution is within statistical error. For example, say you only have enough data to publish an estimate with a confidence interval of +\- 0.1 units. If your optimization algorithm is computer-intensive, then running it until it converges to +\- 0.00001 units is just a waste of time. For instance, see Bottou & Bousquet’s “The Tradeoffs of Large Scale Learning.”
  • My ML professor, midway through a classification-focused semester, finally discussing regression for 10 minutes: “…And that’s all you need to know about regression.”
    My Regression professor, at end of semester, finally discussing classification for 20 minutes: “…And that’s all you need to know about classification.” 🙂
  • In any class that covers proofs or other long detailed arguments, handouts+chalkboards are seriously better than slideshows. With a chalkboard, you can show the whole proof at once—so if students get lost halfway through, they can still see the claim we’re proving and all the steps we’ve made so far. But when you cram a proof onto slides, either you oversimplify to get it onto one slide; or you split it across slides, so that we lose the continuity (and may even forget what we’re trying to prove).
  • Good homeworks and quick feedback are critical. One of my classes had weekly homeworks, each directly tied to the material we just covered, each problem expanding on a good question or illustrating an interesting principle from class. Homeworks were graded within a week, every single time.
    In another class, we had just a few homeworks, very loosely tied to the lecture contents and usually at a very different level (way too easy or too hard relative to what the lecture covered). Although this class had the same number of students and TAs as the other one, we never got our homeworks back in less than 2 weeks—and one of them took a full 2 months to return!
  • TAing is a mixed bag. I enjoy holding office hours and being there during lab sessions to help students understand something they were missing. I do not so much enjoy grading homeworks and labs by those students who don’t ask questions, don’t come to office hours, and clearly don’t read the comments I leave on their assignments since I see them make the same mistakes over and over. I especially don’t like finding instances of cheating. Urgh.
  • I was a bit worried about coming back to grad school as an “older” student (the youngest guy in our 1st-year PhD cohort is almost a decade younger than me!). But it’s been great, actually:
    • My schedule seems much saner than some of my classmates’. Quite a few seem to stay in the office until late most nights, then may sleep through a morning class. For me, after years of waking at 6:30 to spend an hour on the crowded metro to work… it’s been luxurious to sleep in until 7:30 or 8, walk to school in half an hour in the fresh air, have a focused workday of reasonable length, and come home for dinner with my wife, actually relaxing in the evening instead of studying until 3am. Yes, there’s the occasional late night, but occasional is the key word there.
    • The income’s lower than my old job, of course, but Pittsburgh is much cheaper than DC, especially for housing. Besides: my previous school loans are all paid off, I have a fair chunk of retirement savings already earning interest, and my wife and I are used to budgeting. (YNAB is an excellent tool for this—I will blog about it at some point. If you’re interested, here’s a slight discount referral code, or you can wait for the big sale they seem to have every 3-4 months.)
      [My point is: despite the drop in income, we’re still more financially secure (thanks to savings and paid-off loans) than if I’d gone straight into the PhD from my MSc.]
    • As Cosma Shalizi points out: “Note to graduate students: It is important that you internalize that you are, in fact, a badass…” With age and experience, I’m far more able to speak confidently when it’s called for (e.g. giving a talk), and far less intimidated about tackling new topics, talking to professors, writing papers, speaking at conferences, etc.
  • On the other hand, despite longer experience as a statistician than my classmates, I appreciate and admire that they are much better at many things. I’m really impressed by my various classmates’ command of topics like real analysis and measure theory, scientific computing, or practical knowledge about fields like physics or economics.
  • Pittsburgh is a great town. Affordable housing, decent bus system, beautiful scenic views from the inclines, friendly people, livable walkable neighborhoods, tons of good food, extensive and well-run library system… It has a lot of what I liked about Portland, without as much of the “Portlandia” over-the-top hipsters. There are also beautiful old buildings, like the Carnegie Natural History Museum (with its sweet dinosaur exhibit) and UPitt’s Cathedral of Learning. The weather right now is pretty snowy/icy, but I don’t mind—I’m honestly impressed by how well Pittsburgh just goes ahead and deals with winter weather, in comparison to DC’s city-wide shutdown every time a snowflake is sighted.

Edit: Here’s another good post on the first semester of a PhD program, from several mathematics students. I agree with most of the responses, especially the ones that conflict each other 🙂

Next up: the 2nd, 3rd, 4th, 5th, 6th, 7th, 8th, 9th, and 10th semesters of my Statistics PhD program.

Transitions

Apologies for the lack of posts recently. I’m very excited about upcoming changes that are keeping me busy:

Let me suggest a few other blogs to follow while this one is momentarily on the back burner.

By my Census Bureau colleagues:

By members of the Carnegie Mellon statistics department:

The tuba effect

The Jingle All The Way 8k results are up, and naturally I was curious how I stacked against the other runners. I know I’m no sprinter, so I’ve just plotted the median times within each age-by-gender category. Apparently carrying a tuba gave me a race time comparable to the median among 70-74 year old women.

Of course I already knew I’d lose a race against my grandmother, a strong Polish woman who taught PE for many years. But when I’m carrying a tuba, your grandmother could likely beat me too.

Too close for bells, I’m switching to tubas

So when I’m not visualizing data or crunching small area estimates, I’ve been training to run DC’s Jingle All The Way 8k.

Most people wear little jingle bells as they run this race.
I decided to carry a tuba instead.

 More photos here. The one above is thanks to a blog I found by googling the race name + tuba. Our team t-shirts said Tuba Awareness, and apparently people were indeed aware! 🙂

My time was super slow (although I placed 1st in the carrying-a-tuba category), but I did run the whole thing, and I had a blast playing carols along the way. I really need to find somewhere in DC to play regularly, though perhaps a bit more sedentary…

Synaesthesia (or, This is Your Brain on Physics)

John Cook posted a fascinating Richard Feynman quote that made me wonder whether the physicist may have had synaesthesia:

I see some kind of vague showy, wiggling lines  — here and there an E and a B written on them somehow, and perhaps some of the lines have arrows on them — an arrow here or there which disappears when I look too closely at it. When I talk about the fields swishing through space, I have a terrible confusion between the symbols I use to describe the objects and the objects themselves. I cannot really make a picture that is even nearly like the true waves.

As it turns out, he probably did:

As I’m talking, I see vague pictures of Bessel functions from Jahnke and Emde’s book, with light-tan j’s, slightly violet-bluish n’s, and dark brown x’s flying around. And I wonder what the hell it must look like to the students.

The letter-color associations in this second quote are a fairly common type of synaesthesia. However, the first quote above sounds quite different, but still plausibly like synaesthesia: “I have a terrible confusion between the symbols I use to describe the objects and the objects themselves”…

I wonder whether many of the semi-mystical genius-heroes of math & physics lore (also, for example, Ramanujan) have had such neurological conditions underpinning their unusually intuitive views of their fields of study.

I love the idea of synaesthesia and am a bit jealous of people who have it. I’m not interested in drug-induced versions but I would love to experiment with other ways of experiencing synthetic synaesthesia myself. Wired Magazine has an article on such attempts, and I think I remember another approach discussed in Oliver Sacks’ book Musicophilia.

I have a friend who sees colors in letters, which helps her to remember names — I’ve heard her think out loud along these lines: “Hmm, so-and-so’s name is kind of reddish-orange, so it must start with P.” I wonder what would happen if she learned a new alphabet, say the Cyrillic alphabet (used in Russian etc.): would she associate the same colors with similar-sounding letters, even if they look different? Or similar-looking ones, even if they sound different? Or, since her current associations were formed long ago, would she never have any color associations at all with the new alphabet?

Also, my sister sees colors when she hears music; next time I see her I ought to ask for more details. (Is the color related to the mood of the song? The key? The instrument? The time she first heard it? etc. Does she see colors when practicing scales too, or just “real” songs?)

Finally, this isn’t quite synaesthesia but another natural superpower in a similar vein, suggesting that language can influence thought:

…unlike English, many languages do not use words like “left” and “right” and instead put everything in terms of cardinal directions, requiring their speakers to say things like “there’s an ant on your south-west leg”.  As a result, speakers of such languages are remarkably good at staying oriented (even in unfamiliar places or inside buildings) and perform feats of navigation that seem superhuman to English speakers. In this case, just a few words in a language make a big difference in what cognitive abilities their speakers develop. Certainly next time you plan to get lost in the woods, I recommend bringing along a speaker of Kuuk Thaayorre or Guugu Yimithirr rather than, say, Dutch or English.

The human brain, ladies and gentlemen!