Wednesday, July 30, 2008

The End of Theory?!

Wired has a thought-provoking and interesting article by Chris Anderson, titled "The End of Theory: The Data Deluge Makes the Scientific Method Obsolete". This article has spurred a lot of discussion on the internet. I'm still thinking it over. But I'll put my two cents in: I wonder if Anderson's take is true for all of science. If so, everything is data and can be described by data. I know that lots of people think this is true (e.g. singularity theory). However I think reality can not only be described by data. For instance, can someones soul be described by data? And, doesn't Anderson's article itself show we always (or often?) need theory (hypothesis, believe, convictions) to say something about practice?
Anyway, large parts of reality can be described by data. And for this Anderson's theory is very interesting indeed, just ponder on the examples that he gives.

Some highlights from the article:

About the Petabyte Age: "It forces us to view data mathematically first and establish a context for it later."

"Google's founding philosophy is that we don't know why this page is better than that one: If the statistics of incoming links say it is, that's good enough. No semantic or causal analysis is required."

"This is a world where massive amounts of data and applied mathematics replace every other tool that might be brought to bear. Out with every theory of human behavior, from linguistics to sociology. Forget taxonomy, ontology, and psychology. Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves.
The big target here isn't advertising, though. It's science. The scientific method is built around testable hypotheses. These models, for the most part, are systems visualized in the minds of scientists. The models are then tested, and experiments confirm or falsify theoretical models of how the world works. This is the way science has worked for hundreds of years.
Scientists are trained to recognize that correlation is not causation, that no conclusions should be drawn simply on the basis of correlation between X and Y (it could just be a coincidence). Instead, you must understand the underlying mechanisms that connect the two. Once you have a model, you can connect the data sets with confidence. Data without a model is just noise.
But faced with massive data, this approach to science — hypothesize, model, test — is becoming obsolete."
"There is now a better way. Petabytes allow us to say: "Correlation is enough." We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot."
"This kind of thinking is poised to go mainstream. In February, the National Science Foundation announced the Cluster Exploratory, a program that funds research designed to run on a large-scale distributed computing platform developed by Google and IBM in conjunction with six pilot universities."
"Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all.
There's no reason to cling to our old ways. It's time to ask: What can science learn from Google?"


Post a Comment

Please leave a comment! Just log in using one of the formats and if you want me to get back to you. Otherwise comment anonymously.