Statisticians have always done a myriad of different things related to data collection and analysis. Many of us are surprised (even frustrated) that Data Science is even a thing. “That’s just statistics under a new name!” we cry. Others are trying to bring Data Science, Machine Learning, Data Mining, etc. into our fold, hoping that Statistics will be the “big tent” for everyone learning from data.
But I do think there is one core thing that differentiates Statisticians from these others. Having an interest in this is why you might choose to major in statistics rather than applied math, machine learning, etc. And it’s the reason you might hire a trained statistician rather than someone else fluent with data:
Statisticians use the idea of variability due to sampling to design good data collection processes, to quantify uncertainty, and to understand the statistical properties of our methods.
When applied statisticians design an experiment or a survey, they account for the inherent randomness and try to control it. They plan your study in such a way that’ll make your estimates/predictions as accurate as possible for the sample size you can afford. And when they analyze the data, alongside each estimate they report its precision, so you can decide whether you have enough evidence or whether you still need further study. For more complex models, they also worry about overfitting: can this model generalize well to the population, or is too complicated to estimate with this sample and hence is it just fitting noise?
When theoretical statisticians invent a new estimator, they study how well it’ll perform over repeated sampling, under various assumptions. They study its statistical properties first and foremost. Loosely speaking: How variable will the estimates tend to be? Will they be biased (i.e. tend to always overestimate or always underestimate)? How robust will they be to outliers? Is the estimator consistent (as the sample size grows, does the estimate tend to approach the true value)?
These are not the only important things in working with data, and they’re not the only things statisticians are trained to do. But (as far as I can tell) they are a much deeper part of the curriculum in statistics training than in any other field. Statistics is their home. Without them, you can often still be a good data analyst but a poor statistician.