A Theory of Data, Clyde Coombs

Earlier I’ve quoted Leland Wilkinson in The Grammar of Graphics, where he recommends Clyde Coombs’ book A Theory of Data:

…in a landmark book, now out of print and seldom read by statisticians, Coombs (1964) … believed that the prevalent practice of modeling based on cases-by-variables data layouts often prevents researchers from considering more parsimonious structural theories and keeps them from noticing meaningful patterns in their data.

I checked out Coombs’ book through interlibrary loan and haven’t had time to read it thoroughly before the due date. But even from skimming it on the train a few days, I can see why Wilkinson recommends it.

Chapter 1 lays out a whole conceptual framework for types of data, and the rest of the book explains how to collect and analyze it.

Conceptual Framework

As Wilkinson does (for graphics), Coombs builds a broad framework (for psychological data) that not only encompasses familiar structures but also suggests other types you may not have thought to collect or analyze before. In Coombs’ case, the ideas are inspired by the kind of data you get when you ask people about their choices or preferences. In other words, you need some stimuli as well as some deciders to compare those stimuli. (Let me know if you have good examples of using Coombs’ ideas in a different setting.)

Coombs’ approach models the stimuli as points in a common space. In a few of the data types, he also models the respondents’ ideal choice as points in that same space.

Let’s use desserts to illustrate Coombs’ four types of data:

Preferential choice: you show your respondents a few desserts and ask them to choose their favorite or rank their preferences. Each sweet is represented by a point in some multidimensional space, and each respondent is represented by the point that would correspond to their ideal dessert. If, say, Alice’s platonic ideal of dessert is warm and chocolaty, then she’d rank brownies higher than either apple crisp or chocolate ice cream.
Single stimulus: you show the respondent one item at a time and ask whether it’s a dessert. If Bob agrees that pumpkin pie is dessert but roasted squash isn’t, then the pumpkin pie point is within some neighborhood of Bob’s platonic ideal dessert, while roasted squash is outside that neighborhood.
Stimulus comparison: you’ve already defined your variable of interest — say, sweetness — although you’re interested in the typical perception of sweetness, not a chemical measure of sugar content. You ask respondents which sweet item is further along that axis. Maybe people will tend to rank Coke as sweeter than Dr Pepper, which is sweeter than Diet Coke.
Similarities: you simply ask whether a certain pair of desserts is more similar than another pair. In other words, given two pairs of points in dessert-space, which pair’s points are closer to one another? Are M&Ms more similar to chocolate bars than M&Ms are to Skittles?

These four data types come from a 2×2 matrix, whose dimensions are these questions: Do the points come from 2 classes (respondents and stimuli), or 1 (just stimuli)? And are the comparisons between individual points or dyads?

Preferential choice: 2 classes, dyads. Is the Alice’sIdeal-to-Brownies distance shorter or longer than the Alice’sIdeal-to-IceCream distance?
Single stimulus: 2 classes, points. Is the Bob’sIdeal-to-PumpkinPie distance short enough to fall within the dessert horizon?
Stimulus comparison: 1 class, points. Is the difference in sweetness between Coke and Diet Coke positive or negative?
Similarities: 1 class, dyads. Which pair has the shorter inter-point distance: M&Ms and Snickers, or M&Ms and Skittles?

Finally, there’s yet another dimension within each cell of the 2×2 matrix: order relation (we can tell which distance is shorter) vs proximity relation (we can only tell whether a distance is below some threshold). In essence, this means we can create a 2x2x2 array from the original 2×2 matrix. The new sub-cells depend on whether we force respondents to make a choice or only ask if they have a choice:
(Alice says she loves brownies more than ice cream) vs. (Alice says she does prefer one of brownies or ice cream, but hasn’t said which)

NOW! The power of a framework like this is that you can come up with new potential data types to collect and analyze. For example, in the preferential choice cell, we’d usually ask Alice for an order relation (which does she like more: brownies or ice cream?) But we could ask for a proximity relation (does Alice even have a preference between brownies and ice cream?). This kind of data isn’t usually collected or analyzed, but maybe there’s a situation where it’d be useful.

Another example: in preferential choice and similarities data, we usually keep one item in each dyad constant across the comparison: “Are M&Ms more similar to chocolate bars than M&Ms are to Skittles?” But you could have all-different points within each dyad: “Are M&Ms more similar to Snickers than Skittles are to Starburst?” Again, this might be harder to collect and analyze, but there might be times when it’s useful.

Data Collection and Analysis

There’s a vivid passage where Coombs describes the importance of the scientist’s choice of data structure:

Although the data are an offspring of behavior, the scientist has a much more intimate and creative relation to the process than that of midwife. Behavior does not yield data by parthenogenesis. The role of the scientist in the process is to choose the genus; the behavior then chooses the species. Behavior never acts or speaks for itself in creating data; it only speaks when spoken to, when asked a question. The experimenter selects the repertoire, a particular alphabet of messages, and then the behavior chooses from these alternatives what to play, what the message is to be.

Hence, Coombs also discusses how best to collect this kind of data. Should you ask people to pick their favorite out of two items, or out of more? Pick 1 of n, or k out of n? Rank all n items, rank only the top k out of n, etc.? (The short answer, of course, is: It depends.)

Finally, Coombs goes into detail on how to actually analyze this data. Given several people’s rankings or choices of stimuli, how do you actually turn that into one coherent ranking on a 1D scale, or into a 2D data space you can plot? At first glance these techniques seem to be related to multidimensional scaling, now used in political science for voting similarity analyses (that example is based on Machine Learning for Hackers, which I’m just about to read too) or in market research for perceptual mapping.

Other links

See also Amos Tversky’s obituary for Coombs, which does a much better job than I can of providing context for his work.

And Keith Poole at UC-SD teaches a political science class on these techniques. The course materials include R scripts for running these analyses, as well as a PDF scan of Chapter 1 of Coombs’ book.