## Other students’ views on CMU’s Statistics department

(1) There are a couple of nice posts on Quora answering “What is it like to be a graduate student in Statistics at CMU?”
(If you don’t want to sign in to Quora, you might be able to read the replies through these direct links: Jack, Alex, Sangwon.)

(2) When I was applying to schools, a fellow PhD student here shared his thoughts about CMU’s Statistics department. He kindly allowed me to share his comments here as a guest post, though he warns it may be a year or two out of date.

In probably all graduate programs, but at least at CMU, graduate study consists of a coursework component and a research component. (You can see the curriculum here, and while they keep tweaking it, this looks like it’s more or less up to date.) As you can see, the balance starts out tilted heavily toward coursework and gradually starts to shift toward research, so that by your fourth semester you are mostly doing research. This makes sense – it would be tough to do much research-wise without at least some foundational methodological and theoretical training.

A key component of the easing-in process is the well-designed but not particularly well-named Advanced Data Analysis (“ADA”) course, which is a yearlong project spanning your second and third semesters. In this, you choose a professor to work with (they all give presentations about their work first semester to give you a sense of whom to choose), and this professor arranges a relationship with an outside investigator — a “real scientist”, not a statistician, usually in some other department at CMU or Pitt — who has data for you to analyze. Then the three (or more) of you work on the problem of analyzing that data for a year, meeting relatively frequently to discuss progress and whatever issues may arise. You also produce reports and presentations on the project as milestones.

So I’m now at the beginning of my second year, in the midst of my ADA project as well as the Advanced Stat Theory class. To give you a sense of an ADA project, I am working with two professors from Stats and one from CMU Astrophysics on a data set consisting of galaxies, trying to develop predictive models for galaxy redshift purely by analyzing these images. Other ADA projects right now include applications to educational testing, the genetic basis of autism, and medical studies of dementia.

So with that said, while I’m not in the full-blown research part of the PhD, I’ve still had the opportunity to work closely with professors and it has been very fruitful. They tend to be accessible and willing to meet as often as I want to, which tends to be once a week or every other week. My experience with research is that we’ll meet and talk about stuff, then I’ll go home and try whatever new stuff is suggested, and when I have something to show or have hit a wall, we meet again to talk about it. I’ve also started going to the meetings of the Astrostatistics group, which is a collaborative research effort between CMU Stats, CMU Astrophysics, and Pitt Astronomy, and hearing about all the research that’s being done in that setting.

I think the way CMU structures the research experience speaks to how much emphasis it places on acclimating you to that environment, which is really quite different from the classroom. Regarding the coursework component, most of the classes I’ve had here have been well-taught, and the professors hold office hours and generally welcome student inquiry. I think the professors, for the most part, do an admirable job of juggling their research and teaching without short-changing one piece or the other. I’ve definitely learned a ton from classes, which is important because my background in statistics was rather weak coming in. (I had a solid foundation in Math and CS, but not a ton of exposure to Stats.)

Regarding the distinguishing qualities of the program, there are a few. Among the spectrum of theoretical vs. applied programs, it tends to skew applied — there are a few people doing theory but many more working on applications to various fields. (This could be a good thing or a bad thing depending on your taste.) But if you ask people here, they might say the distinction between theoretical and applied work is kind of silly, since advances in theory can yield new methodology and novel applications can motivate development of theory. But anyhow, given that professors do a lot of applied work, there are fertile collaborations here with quite a few disciplines — astrostatistics as I mentioned, neuroscience, CS/machine learning, genetics, even some people working on finance/economics problems. So it’s not limiting at all in terms of what you can work on.

Another good thing about the program is that it’s pretty current and (you might say) somewhat pragmatic. For instance, they just revamped our Advanced Stat Theory core course to be taught with a huge focus on nonparametric inference instead of the canonical/classical inference theory, because it turns out that most people in real-world research settings are using nonparametric methods much more. In general, it’s great when a department recognizes that a field is evolving (rapidly!) and they are willing to adapt to cover what will be useful for students rather than what they became famous for writing books about in the ’70s.

That’s all I can think of now. Best of luck with the application process!

## More on graduate study for careers in Statistics

First, the Science career magazine has a good article, “Careers in Statistics Evolve and Expand,” with job growth projections and a few interviews. However, there’s not much direct advice on how to land one of these jobs.

Meanwhile, I’ve received a couple more emails asking how best to prepare for Statistics careers. If you’re an employer, or a recent graduate, do you have any advice to share?

First email:

I am working in the area of cancer research. After spending some time doing clinical data analysis and working on genomics, I realized data analysis is something I really enjoy. I have already started learning Python and R. But considering my background and no proper academic training in math/stat, how do I go about getting a job in industry related to big data? Do you think getting a Master’s would help? Even for Master’s I would need undergrad courses in Math/Lin Algebra.

My response:

Python and R are great tools to work with, so it’s good that you’ve been learning those.

What kind of big data jobs are you interested in? Unfortunately, I don’t have too much advice about industry jobs, since my time has been mostly in government and academia.*

In general, I think there’s great value in having a rigorous statistics background when doing data analysis, so that you know the limitations of your data and your conclusions. However, some employers might prefer you to have expertise in fast algorithms or big-data tools like Hadoop (which most statistics Masters programs don’t really cover). If you’d like to work in such positions, you may prefer to focus on learning programming or computer science.

If you do go for a Masters in statistics, you will definitely want to brush up on calculus and linear algebra. These mathematical foundations are needed for stating (and proving) core concepts in statistics.

Second email:

My undergraduate degree is in a humanities field, but I have been taking computer science, stats, and math courses so that I could apply to either a Masters in Comp Sci or in Statistics. I really enjoy stats, and I have done well in all of my stats classes, including a graduate level course. It also fits in well with my interest in information and how to understand and manipulate it in order to make it understandable.

I feel like my interests and background make the Applied Statistics degree more what I am looking for. I am also thinking that the online option might be a good idea because it would allow me to build my contacts through part-time work or internships.

Anyways, since you have such as varied background, I am wondering what your thoughts are on the professional Masters in Applied Statistics program and whether it would be a way for me to get into the stats field, or if it is looked down on by employers? Also, what do you think of the residency vs. online options?

And my response:

Do you know what you’d like to do in statistics, once you have the degree? Are you interested in academia, industry, consulting, government, healthcare, etc?

My work experience was in government, where the hiring standards are pretty explicit. For example, most Masters programs would prepare you well for work at the Census Bureau at the GS-09 grade level, as I did with my Masters. Here’s the job posting and other related opportunities. As long as the online program is properly accredited, the fact that it’s online shouldn’t matter for government jobs.

But I don’t have much experience with industry jobs, and hiring there has changed a lot in the 5 years since I last applied for jobs.* Google searches for “data scientist” only picked up around 2012

Finally, just in case you haven’t taken online courses before, I’d recommend trying some before you sign up for an all-online degree. Personally I’ve found I learn much better when I come to class regularly and talk to professors in person. But if that’s not an issue for you, then it sounds great to have the flexibility to do part-time work or internships. That kind of practical experience should help a lot on the job hunt too.

*Clearly, I should really ask one of our recent graduates from CMU’s 1-year Masters in Statistical Practice to write a post about their experience on the job hunt this semester. They’d have a much better idea of how prospective employers today look at stats Masters degrees.

Related posts:

For CMU specifically:

## Barkov Chain Monte Crawlo

Finally, another semester over. I’ll post my 2nd-semester reflections soon… but meanwhile, who wants to grab a drink?

If you’re in Pittsburgh this Saturday, come join me and my classmates for a pub crawl, at the South Side bars on E Carson St (around 11th to 22nd St). We plan to start at The Library (2302 East Carson St) around 5:30 or 6pm, and go from there. If you come later, I’ll try to update our location here or on Twitter with hashtag #statbeer.

The plan:

Although I can’t claim originality (a web search turns up this), I believe I came up with this independently: I propose using Markov Chain Monte Carlo (MCMC) to stage a bar crawl and/or using the bar crawl metaphor to explain MCMC.

The MCMC Bar Crawl* (a.k.a. Barkov Chain Monte Crawlo) is simple:

1. We randomly propose a nearby bar to visit
2. We vote: how many people like that bar better than where we are now?
3. If it’s not unanimous, roll a die to see whether we stay here or move there
4. Have a drink and repeat

* (Basically a Metropolis sampler from the multinomial distribution on our bar preferences.)

Update: this was a success and we’ll do it again. See also SMBC’s Bayesian Drinking Game.

## Belief-Sustaining Inference

TL;DR: If you’re in Pittsburgh today, come to SIGBOVIK 2014 at CMU at 5pm for free food and incredible math!

In a recent chat with my classmate Alex Reinhart, author of Statistics Done Wrong, we noticed a major gap in statistical inference philosophies. Roughly speaking, Bayesian statisticians begin with a prior and a likelihood, while Frequentist statisticians use the likelihood alone. Obviously, there is scope for a philosophy based on the prior alone.

We began to develop this idea, calling it Belief-Sustaining Inference, or BS for short. We discovered that BS inference is extremely efficient, for instance getting by with smaller sample sizes and producing tighter confidence intervals than other inference philosophies.

Today I am proud dismayed complacent to report that our resulting publication has been accepted to the prestigious adequate SIGBOVIK 2014 conference (for topics such as Inept Expert Systems, Artificial Stupidity, and Perplexity Theory):

Reinhart, A. and Wieczorek, J. “Belief-Sustaining Inference.” SIGBOVIK Proceedings, Pittsburgh, PA: Association for Computational Heresy, pp. 77-81, 2014. (pdf)

Our abstract:

Two major paradigms dominate modern statistics: frequentist inference, which uses a likelihood function to objectively draw inferences about the data; and Bayesian methods, which combine the likelihood function with a prior distribution representing the user’s personal beliefs. Besides myriad philosophical disputes, neither method accurately describes how ordinary humans make inferences about data. Personal beliefs clearly color decision-making, contrary to the prescription of frequentism, but many closely-held beliefs do not meet the strict coherence requirements of Bayesian inference. To remedy this problem, we propose belief-sustaining (BS) inference, which makes no use of the data whatsoever, in order to satisfy what we call “the principle of least embarrassment.” This is a much more accurate description of human behavior. We believe this method should replace Bayesian and frequentist inference for economic and public health reasons.

If you’re around CMU today (April 1st), please do stop by SIGBOVIK at 5pm, in Rashid Auditorium in the Gates-Hillman Center. There will be free food, and that’s no joke.

## Bayesian statistics and applied computing at CMU?

I wanted to pick your brain about stats and machine learning at CMU … I’m considering a Ph.D. in a finance or a related discipline.

Here’s the thing, I’m very much attracted to schools with established inter-disciplinary programs, like CMU’s additional masters in machine learning, and Duke’s supplemental masters in statistics. Duke bills itself as the best Bayesian shop under the sun, which is also attractive. I’m not dogmatically fixed on Bayesian methods, but I do find it a much more natural way of thinking, and more naturally applied to practical problems.

One person I spoke to suggested that Duke was the better program based on my interests, but I read on your blog that Bayesian methods and applied computing are pretty well represented at CMU, so I figured I’d get your thoughts. Leaving aside the reality that one can’t choose where they’re admitted, and that one should focus their choice on the strength of their primary department, I’d like to know which would be the better option.

My response:

I admit I don’t know much about the finance program here, nor about the supplemental masters in ML. And of course I know even less about Duke’s equivalent programs.

That said, CMU is absolutely a strong place for machine learning and statistics, including applied computing and Bayesian statistics.

Bayes:

• The core courses for the ML masters (10-701, 10-702, and 10-705) do cover Bayesian methods and inference. We study the basic theory and plenty of applications, including less-often-taught methods like Bayesian nonparametrics. Parts of 702 and 705 are especially helpful for clarifying how Frequentist and Bayesian inferences differ. Although I’m a fan of the Bayesian approach, I really appreciate how Larry Wasserman challenges us to understand its weaknesses thoroughly, using plenty of examples where Classical methods have an advantage over Bayesian ones (such as Sec 12.6 here).
• Beka Steorts also offers a pair of courses that go into more depth on Bayesian theory and applications.
• There’s also a close link between the Statistics and Philosophy departments: particularly Kadane and Schervish here in Stats, and Seidenfeld in Phil, work together regularly on the foundations of statistical inference, incl. Bayesian.

ML and applied computing:

I’m sure that you’d find CMU worthwhile if you end up coming here.

## With loss of generality

Public service announcement: Dear math and statistics students, “WLOG” means you’re about to prove something “without loss of generality.”

So please don’t copy your friend’s homework and write it as “with log” or “using log”. It’s just too easy for your grader to catch you.

♪ The more you know! ♫

## Reproducible research, training wheels, and knitr

Last week I gave a short talk at CMU’s statistical computing seminar, Stat Bytes. I summarized why reproducible research (RR) and literate programming are worthwhile, not just for serious research but also for homework reports or statistical blog posts. I demonstrated how to get started with a range of RR document formats in R: from the “training wheels” R Notebook in RStudio, through the more flexible but still simple R Markdown format, to R Sweave for $\LaTeX$ articles and Beamer slides.

If you’ve wanted to get on the RR bandwagon, but found Sweave too overwhelming, these other tools are a great way to start—and useful in their own right, not just for training.

My materials are here:

• Overview and links (html output, Rmd source)
• R Notebook example (html output, R source)
• R Markdown example (html output, Rmd source)
• R Sweave / Beamer example (pdf output, Rnw source)

Extra details below.

Reproducible research story time

First, story time! I was once asked to step in and take over the statistical analysis for an article, after the primary statistician became unavailable. It sounded like a pretty straightforward analysis of survey data, with clear scientific questions, and they told me they had the previous statistician’s R code, so I thought it sounded reasonable. Hah…

## After 1st semester of Statistics PhD program

Have you ever wondered whether the first semester of a PhD is really all that busy? My complete lack of posts last fall should prove it

Some thoughts on the Fall term, now that Spring is well under way [edit: added a few more points]:

• RMarkdown and knitr are amazing. When I next teach a course using R, my students will be turning in homeworks using these tools: The output immediately shows whether the code runs and what its results are. This is much better than students copying and pasting possibly-broken code and unconnected output into a text file or (gasp) Word document.
• I’m glad my cohort socializes outside the office, taking each other out for birthday lunches or going to see a Pirates game. Some of the older PhD students are so focused on their thesis work that they don’t take time for a social break, and I’d like to avoid getting stuck in that rut.
However! Our lunches always lead us back to the age old question: How many statisticians does it take to split a bill? Answer: too long. I threw together a Shiny app, DinneR, to help us answer this question

• The first-year PhD courses in Statistics and in Machine Learning have rather different approaches.
• Statistics professor: Just assume we can compute this estimator. In class we’ll prove that the estimates are reasonably good (e.g. we’ll bound the probability that an estimate is far from the true value).
• Machine Learning professor: Just trust me that this algorithm gets useful estimates. In class we’ll prove that we can compute it in a reasonable amount of time (e.g. we’ll bound the number of steps until the algorithm converges).
• Somewhere between these ideas, I ran into the sensible concept of optimizing only until your solution is within statistical error. For example, say you only have enough data to publish an estimate with a confidence interval of +\- 0.1 units. If your optimization algorithm is computer-intensive, then running it until it converges to +\- 0.00001 units is just a waste of time. For instance, see Bottou & Bousquet’s “The Tradeoffs of Large Scale Learning.”
• My ML professor, midway through a classification-focused semester, finally discussing regression for 10 minutes: “…And that’s all you need to know about regression.”
My Regression professor, at end of semester, finally discussing classification for 20 minutes: “…And that’s all you need to know about classification.”
• In any class that covers proofs or other long detailed arguments, handouts+chalkboards are seriously better than slideshows. With a chalkboard, you can show the whole proof at once—so if students get lost halfway through, they can still see the claim we’re proving and all the steps we’ve made so far. But when you cram a proof onto slides, either you oversimplify to get it onto one slide; or you split it across slides, so that we lose the continuity (and may even forget what we’re trying to prove).
• Good homeworks and quick feedback are critical. One of my classes had weekly homeworks, each directly tied to the material we just covered, each problem expanding on a good question or illustrating an interesting principle from class. Homeworks were graded within a week, every single time.
In another class, we had just a few homeworks, very loosely tied to the lecture contents and usually at a very different level (way too easy or too hard relative to what the lecture covered). Although this class had the same number of students and TAs as the other one, we never got our homeworks back in less than 2 weeks—and one of them took a full 2 months to return!
• TAing is a mixed bag. I enjoy holding office hours and being there during lab sessions to help students understand something they were missing. I do not so much enjoy grading homeworks and labs by those students who don’t ask questions, don’t come to office hours, and clearly don’t read the comments I leave on their assignments since I see them make the same mistakes over and over. I especially don’t like finding instances of cheating. Urgh.
• I was a bit worried about coming back to grad school as an “older” student (the youngest guy in our 1st-year PhD cohort is almost a decade younger than me!). But it’s been great, actually:
• My schedule seems much saner than some of my classmates’. Quite a few seem to stay in the office until late most nights, then may sleep through a morning class. For me, after years of waking at 6:30 to spend an hour on the crowded metro to work… it’s been luxurious to sleep in until 7:30 or 8, walk to school in half an hour in the fresh air, have a focused workday of reasonable length, and come home for dinner with my wife, actually relaxing in the evening instead of studying until 3am. Yes, there’s the occasional late night, but occasional is the key word there.
• The income’s lower than my old job, of course, but Pittsburgh is much cheaper than DC, especially for housing. Besides: my previous school loans are all paid off, I have a fair chunk of retirement savings already earning interest, and my wife and I are used to budgeting. (YNAB is an excellent tool for this—I will blog about it at some point. If you’re interested, here’s a slight discount referral code, or you can wait for the big sale they seem to have every 3-4 months.)
[My point is: despite the drop in income, we’re still more financially secure (thanks to savings and paid-off loans) than if I’d gone straight into the PhD from my MSc.]
• As Cosma Shalizi points out: “Note to graduate students: It is important that you internalize that you are, in fact, a badass…” With age and experience, I’m far more able to speak confidently when it’s called for (e.g. giving a talk), and far less intimidated about tackling new topics, talking to professors, writing papers, speaking at conferences, etc.
• On the other hand, despite longer experience as a statistician than my classmates, I appreciate and admire that they are much better at many things. I’m really impressed by my various classmates’ command of topics like real analysis and measure theory, scientific computing, or practical knowledge about fields like physics or economics.
• Pittsburgh is a great town. Affordable housing, decent bus system, beautiful scenic views from the inclines, friendly people, livable walkable neighborhoods, tons of good food, extensive and well-run library system… It has a lot of what I liked about Portland, without as much of the “Portlandia” over-the-top hipsters. There are also beautiful old buildings, like the Carnegie Natural History Museum (with its sweet dinosaur exhibit) and UPitt’s Cathedral of Learning. The weather right now is pretty snowy/icy, but I don’t mind—I’m honestly impressed by how well Pittsburgh just goes ahead and deals with winter weather, in comparison to DC’s city-wide shutdown every time a snowflake is sighted.

Edit: Here’s another good post on the first semester of a PhD program, from several mathematics students. I agree with most of the responses, especially the ones that conflict each other

## Turing-complete inversion tables, presented reasonable on your part!

I’ve not been keeping up with blogging this semester, but I had to share this beautiful spam comment my filter let through this morning:

Appreciation for the excellent writeup. This in reality was previously your fun profile it. Glimpse complex to help way presented reasonable on your part! On the other hand, the way could possibly we be in contact?

I can’t tell if it’s written by a non-native English speaker or by a Markov chain—does that mean it passes the Turing test? Either way, there’s something lovely about its broken grammar.

The author’s name was given as “buy inversion tables.” For a moment I thought this might be a real comment, by someone offering to compute large matrix inversions cheaply and quickly. But no, apparently inversion tables are these things where you strap yourself in, flip over, and hang upside down for as long as you can. Kind of like the first semester of a PhD program

PS—somehow the comment reminds me of when Cosma Shalizi’s students used Markov-chain generated text to fake a blog post for him, in a previous iteration of the Statistical Computing class (which I’m TA’ing this term).

## Data-Driven Journalism MOOC

TL;DR: The Knight Center’s free online journalism courses are great for anyone who works with data, storytelling, or both. See what’s being offered here.
My favorite links from a recent course on Data-Driven Journalism are here.
And a fellow student’s suggested reading list is here.

Last fall, a coworker and I led a study group for the Knight Center‘s MOOC (massive open online course) on “Introduction to Infographics and Data Visualization”, taught by Alberto Cairo. The course and Alberto’s book were excellent, and we were actually able to bring Alberto in to the Census Bureau for a great lecture a few months later. This course is now in its 3rd offering (starting today!) and I cannot recommend it highly enough if you have any interest in data, journalism, visualization, design, storytelling, etc.!

So, this summer I was happy to see the Knight Center offering another MOOC, this time on “Data-Driven Journalism: The Basics”. What with moving cities and starting the semester, I hadn’t kept up with the class, but I’ve finally finished the last few readings & videos. Overall I found a ton of great material.

The course’s five lecturers gave an overview of data-driven journalism: from its historical roots in the 1800s and its relation to computer-aided reporting, to how to get data in the first place, through cleaning and checking the data, and finally to building news apps and journalistic data visualizations.

In week 3 there was a particularly useful exercise of going through a spreadsheet of hunting accidents. Of course it illustrated some of the difficulties in cleaning data, and it gave concrete practice in filtering and sorting. But it was also a great illustration of how a database can lead you potential trends or stories that you might have missed if you’d only gone out to interview a few individual hunters.

I loved some of the language that came up, such as “backgrounding the data” — analogous to checking out your sources to see how much you can trust them — or “interrogating the data,” including coming prepared to the “data interview” to ask thorough, thoughtful questions. I’d love to see a Statistics 101 course taught from this perspective. Statisticians do these things all the time, but our terminology and approach seem alien and confusing the first few times you see them. “Thinking like a journalist” and “thinking like a statistician” are not all that different, and the former might be a much more approachable path to the latter.

For those who missed the course, consider skimming the Data Journalism Handbook (free online); Stanford’s Data Journalism lectures (hour-long video); the course readings I saved on Pinboard; and my notes below.