Did Data Kill Theory?

Thanks to Geoff McGovern for pointing us toward a fascinating essay in Wired.  Chris Anderson posits that the accessibility of information has vaulted us into what he calls the Petrabyte Age, in which

information is not a matter of simple three- and four-dimensional
taxonomy and order but of dimensionally agnostic statistics. It calls
for an entirely different approach, one that requires us to lose the
tether of data as something that can be visualized in its totality. It
forces us to view data mathematically first and establish a context for
it later.

Given how much data is readily available, Anderson continues, "[w]e can
analyze the data without hypotheses about what it might show."  The
scientific method encourages us to explain what we know about the world
and make greater generalizations about the rest of it that we have not
observed; but if we can observe everything, essentially, it seems that
generalizations are no longer necessary.  We don’t need to guess about
what the world might look like, because an hour in front of the
computer can tell us. 

More after the jump.

The article presents that we don’t need to have a prior belief about which two variables will be correlated, because the costs of correlating all the variables we have with each other and seeing which actually are correlated has been greatly reduced.  The article points to certain successes of the "Petrabyte Age," Google’s myriad programs chiefly among them.

It would be silly to argue that AdSense needs more theory or better models driving it (its occasional anomalous goofs are sometimes funny, anyway).  Anderson’s more controversial thesis is that the scientific method is obsolete now because of the availability of data.  He continues to argue that this is true in the natural sciences, like biology where species have been discovered, he discusses, through data-driven statistical analysis.

Now, the discussion of this article can lead us in a variety of directions, and feel free to take the discussion in any direction that is desired.  I, for one, though believe that the value of prediction is missing from this analysis.  Isn’t the meat of what we do in Political Science in our theories’ predictive abilities?  That’s certainly why many of us are here–to provide analysis that can inform those with the capacity to make decisions affecting governance and policy.  We could come up with correlation after correlation after correlation, but does it mean anything?

Even without prediction, adequate explanation does not exist in correlation.  The proposed method of this essay seems to be a rich cataloging of the world without worrying about any use.  Far be it from me to say that there is little value in science for the sake of science, and so I won’t make the argument that science without practical application is irrelevant.  But there is a larger purpose, aside from grant-writing.  What is the point of discovering new species unless it enhances our understanding of our world?  Data cannot self-interpret.  The discovery of new species, as in Anderson’s example, means nothing unless their role is explained by the theories of biologists.  Similarly, I would argue that knowing that democracies only very rarely fight each other means nothing unless we know why.  That understanding informs us about policies that reduce or prevent conflict.  Isn’t that why we care in the first place?

(By the way, have you thought of your political science fortune cookie slogan yet?  Stop by Julie’s post from Monday and let us know!)

2 thoughts on “Did Data Kill Theory?

  1. I can’t wait until this new approach replaces hypotheses on whether nuclear proliferation is good or bad.
    That’s exactly the sort of issue for which I’d like to use a wait-and-see approach.

  2. Indeed. I’m surprised that there isn’t more argument about this, but it seems that pretty much everyone (everyone with access to the internet who has read and written about the article, that is) agrees: hypotheses aren’t so bad, and pure data-mining isn’t so good.

Leave a Reply