So I’ve complained before about the problems with Null Hypothesis Significance Testing (NHST) and how, in many cases, it’d be more informative and more useful to report confidence intervals instead of p-values.
Well, the journal Basic and Applied Social Psychology has recently decided to ban p-values… but they’ve also tossed out confidence intervals and all the rest of classical statistical inference. And they’re not sold on Bayesian inference either. (Nor does their description of Bayes convince me that they understand it, with weird wordings like “strong grounds for assuming that the numbers really are there.”)
Apparently, instead of choosing another, less common inference flavor (such as likelihood or fiducial inference), they are doing away with rigorous inference altogether and only publishing descriptive statistics. The only measure they explicitly mention to prevent publishing spurious findings is that “we encourage the use of larger sample sizes than is typical in much psychology research, because as the sample size increases, descriptive statistics become increasingly stable and sampling error is less of a problem.” That sounds to me like they know sampling error and inference are important—they just refuse to quantify them, which strikes me as bizarre.
I’m all in favor of larger-than-typical sample sizes, but I’m really curious how they will decide whether they are large enough. Sample sizes need to be planned before the experiment happens, long before you get feedback from the journal editors. If a researcher plans an experiment, hoping to publish in this journal, what guidance do they have on what sample size they will need? Even just doubling the sample size is already often prohibitively expensive, yet it doesn’t even halve the standard error; will that be convincing enough? Or will they only publish Facebook-sized studies with millions of participants (which often have other experimental-design issues)?
Conceivably, they might work out these details and this might still turn out to be a productive change making for a better journal, if the editors are more knowledgeable than the editorial makes them sound, AND if they do actually impose a stricter standard than p<0.05, AND if good research work meeting this standard is ever submitted to the journal. But I worry that, instead, it'll just end up downgrading the journal's quality and reputation, making referees unsure how to review articles without statistical evidence, and making readers unsure how reliable the published results are.
Edit: John Kruschke is more hopeful, and Andrew Gelman links to a great paper citing cases of actual harm done by NHST. Again, I’m not trying to defend overuse of p-values—but there are useful and important parts of statistical inference (such as confidence intervals) that cannot be treated rigorously with descriptive statistics alone. And reliance on the interocular trauma test alone just frees up more ways to fiddle with the data to sneak it past reviewers.