Moore method / inquiry-based learning in statistics?

Via Dave Richeson:

For the last 10+ years I’ve taught topology using a modified Moore method, also known as inquiry-based learning (IBL). The students are given the skeleton of a textbook; then they must prove all the theorems and solve all of the problems. They are forbidden from looking at outside sources. The class types up their work as they go. At the end of the semester they have a textbook that they wrote. It is a great way to learn, and at the end of the semester the student are thrilled to hold a bound copy of the textbook that they created.

I love this idea! Wikipedia lists several universities with math courses using the Moore method, but none in probability or mathematical statistics. Google doesn’t suggest much besides this blog post with the same idea, and this article which seems to have good advice but is no longer accessible.

Have you ever seen the Moore approach used for a statistics course? Do you have any success stories or pitfalls to share?

A Theory of Data, Clyde Coombs

Earlier I’ve quoted Leland Wilkinson in The Grammar of Graphics, where he recommends Clyde Coombs’ book A Theory of Data:

…in a landmark book, now out of print and seldom read by statisticians, Coombs (1964) … believed that the prevalent practice of modeling based on cases-by-variables data layouts often prevents researchers from considering more parsimonious structural theories and keeps them from noticing meaningful patterns in their data.

I checked out Coombs’ book through interlibrary loan and haven’t had time to read it thoroughly before the due date. But even from skimming it on the train a few days, I can see why Wilkinson recommends it.

Continue reading “A Theory of Data, Clyde Coombs” →

Statistics is Applied Science Fiction

I’m enjoying the discussion coming out of Alberto Cairo‘s online data visualization course.

Bryn Williams, in a comment on thinkers & creators who read comics & sci-fi for inspiration:

“…a familiarity with imagined alternative worlds makes philosophy an easier path to tread when posing counterfactuals and thought experiments…”

My response:

And not just philosophy or data visualization — I think statistics could be presented as a kind of “applied science fiction.” When you perform a hypothesis test of whether some parameter is 0, you

assume it *is* 0,
imagine what kinds of data you would probably have seen under that assumption, and then
if the real data you *did* see is unlikely under that assumption, decide that the assumption is probably wrong.

It’s just like in SF where

you imagine a possible alternate reality (say, Joe discovers a talent for dowsing),
you explore the consequences if that possibility were true (Joe becomes rich from oil prospecting), and
in the best cases, readers can draw lessons about our actual reality from this thought experiment (http://xkcd.com/808/).

(XKCD is, of course, a great comic for both SF and datavis. See also this recent SMBC for another amusing exploration of “If this claim were true…”)

Loess and Clark

Apologies for the awful pun in the title, but it seemed to befit an exploration of the history of loess local regression, particularly its name and codebase.

If you’re not familiar with loess, it’s basically a nonparametric algorithm that smooths the data to find the local mean of y at each x value. If you want to end up with a more traditional regression, loess can still be a useful starting point for visually finding trends in the data. Earl Glynn shows a worked example with R code that illustrates the loess fit for different values of the bandwidth.

Today was the first session of a Machine Learning study group with my colleagues. (We’re following along Andrew Ng‘s course notes for Stanford’s CS 229, also available on Coursera.) In the first chapter, Ng mentions loess regression, and two colleagues had interesting historical comments about it. Continue reading “Loess and Clark” →

Animated map of 2012 US election campaigning, with R and ffmpeg

[vimeo 52312754]

(Video link here, in case the embedded player doesn’t work for you.)

Idea: see if I can mimic the idea behind Ben Schmidt’s lovely video of ocean shipping routes, and apply it to another dataset. But which?
“Hmm… what’s another interesting dataset about some competitors traveling around a mostly-fixed area at the same time?… Hey friends, stop giving me election news, I need to think of an idea… Oh.” Continue reading “Animated map of 2012 US election campaigning, with R and ffmpeg” →

Javascript and D3 for R users, part 2: running off the R server instead of Python

Thank you all for the positive responses to Basics of JavaScript and D3 for R Users! Quick update: last time we had to dabble in a tiny bit of Python to start a local server, in order to actually run JavaScript and D3 examples on our home computer… However, commenter Shankar had the great idea of using the R server instead. He provided some example code, but reported that it didn’t work with all the examples.

Here’s my alternative code, which works with all the D3 examples I’ve tried so far. Unlike Shankar’s approach with lower-level functions, I found it simpler to use Jeffrey Horner’s excellent Rook package.

# Load the Rook library
library(Rook)

# Where is your d3 directory located?
myD3dir <- 'C:/Downloads'

# Start the server
s <- Rhttpd$new()
s$start(quiet=TRUE)

# To view a different D3 example,
# change the directory and .html file names below
# and rerun s$add() and s$browse()
s$add(
app=Builder$new(
Static$new(
# List all the subdirectories that contain
# any files it will need to access (.js, .css, .html, etc)
urls = c('/d3','/d3/examples','/d3/examples/choropleth'),
root = myD3dir
),
Redirect$new('/d3/examples/choropleth/choropleth.html')
),
name='d3'
)
s$browse(2)
# browse(1) would load the default RookTest app instead

# When you're done,
# clean up by stopping and removing the server
s$stop()
s$remove(all=TRUE)
rm(s)

If I understand the Rook documentation correctly, you just can’t browse directories using R’s local server. So you’ll have to type in the exact directory and HTML file for each example separately. But otherwise, this should be a simple way to play with D3 for anyone who’d rather stick within R instead of installing Python.

Basics of JavaScript and D3 for R Users

Hadley Wickham, creator of the ggplot2 R package, has been learning JavaScript and its D3 library for the next iteration of ggplot2 (tentatively titled r2d3?)… so I suspect it’s only a matter of time before he pulls the rest of the R community along.

Below are a few things that weren’t obvious when I first tried reading JavaScript code and the D3 library in particular. (Please comment if you notice any errors.) Then there’s also a quick walkthrough for getting D3 examples running locally on your computer, and finally a list of other tutorials & resources. In a future post, we’ll explore one of the D3 examples and practice tweaking it.

Perhaps these short notes will help other R users get started more quickly than I did. Even if you’re a ways away from writing complex JavaScript from scratch, it can still be useful to take one of the plentiful D3 examples and modify it for your own purposes. Continue reading “Basics of JavaScript and D3 for R Users” →

Carl Morris Symposium on Large-Scale Data Inference (3/3)

The final summary of last week’s symposium on statistics and data visualization (see part 1 and part 2)… Below I summarize Chris Volinsky’s talk on city planning with mobile data, and the final panel discussion between the speakers plus additional guests.

Continue reading “Carl Morris Symposium on Large-Scale Data Inference (3/3)” →

Carl Morris Symposium on Large-Scale Data Inference (2/3)

Continuing the summary of last week’s symposium on statistics and data visualization (see part 1 and part 3)… Here I describe Dianne Cook’s discussion of visual inference, and Rob Kass’ talk on statistics in cognitive neuroscience.

[Edit: I’ve added a few more related links throughout the post.]

Continue reading “Carl Morris Symposium on Large-Scale Data Inference (2/3)” →

Carl Morris Symposium on Large-Scale Data Inference (1/3)

I enjoyed this week’s Symposium on Large-Scale Data Inference, which honored Harvard’s Carl Morris as the keynote speaker. This was the 2nd such symposium; last year’s honoree was Brad Efron (whose new book I also recommend after seeing it at this event).

This year’s focus was the intersection of statistics and data visualization around the question, “Can we believe what we see?” I was seriously impressed by the variety and quality of the speakers & panelists — many thanks to Social & Scientific Systems for organizing! Look for the lecture videos to be posted online in January.

See below for the first two speakers, Carl Morris and Mark Hansen. The next posts will summarize talks by Di Cook and Rob Kass (part 2), and Chris Volinsky and the final panel discussion (part 3).

Continue reading “Carl Morris Symposium on Large-Scale Data Inference (1/3)” →