Gerard van Belle’s *Statistical Rules of Thumb* has piqued my curiosity at conferences. It turns out my work library has a copy, which has been fun to skim, or should I say, to thumb through.

The book’s examples focus largely on medical and environmental studies, but most of the book does apply to statistics in general.

The book starts off with good “rules of thumb” in the sense of quick calculations, i.e. for the approximate sample size you’d need to get suitably precise estimates in several common situations. But van Belle also suggests more general good advice, such as typical models to start with: when to use Normal vs Exponential vs Poisson etc as your initial model, etc.

Some of my favorite pithy or self-explanatory “rules”:

- 1.9: “Use p-values to determine sample size, confidence intervals to report results”
- 3.3: “Do not correlate rates or ratios indiscriminately”

i.e. if X, Y, and Z are mutually independent, then X/Z and Y/Z will show spurious correlation.
- 5.8 “Distinguish between variability and uncertainty”

i.e. “reduce uncertainty but account for variability”
- 5.13 “Distinguish between confidence, prediction, and tolerance intervals”
- 6.2 “Blocking is the key to reducing variability”
- 6.6 “Analysis follows design”

i.e. the possible analyses will depend on how the randomization was done
- 6.11 “Plan for missing data”

i.e. be explicit about how you intend to deal with it
- 6.12 “Address multiple comparisons before starting the study”

Continue reading →