Archive for July 4th, 2008

Chris Anderson is at it again… stirring the pot with big claims that are hard to falsify but seem to generate a huge amount of discussion with smart people. Check out some of the discussion. Or maybe read the article first.

Here’s an excerpt:

But faced with massive data, this approach to science — hypothesize, model, test — is becoming obsolete. Consider physics: Newtonian models were crude approximations of the truth (wrong at the atomic level, but still useful). A hundred years ago, statistically based quantum mechanics offered a better picture — but quantum mechanics is yet another model, and as such it, too, is flawed, no doubt a caricature of a more complex underlying reality. The reason physics has drifted into theoretical speculation about n-dimensional grand unified models over the past few decades (the “beautiful story” phase of a discipline starved of data) is that we don’t know how to run the experiments that would falsify the hypotheses — the energies are too high, the accelerators too expensive, and so on.

Now biology is heading in the same direction. The models we were taught in school about “dominant” and “recessive” genes steering a strictly Mendelian process have turned out to be an even greater simplification of reality than Newton’s laws. The discovery of gene-protein interactions and other aspects of epigenetics has challenged the view of DNA as destiny and even introduced evidence that environment can influence inheritable traits, something once considered a genetic impossibility. In short, the more we learn about biology, the further we find ourselves from a model that can explain it.

There is now a better way. Petabytes allow us to say: “Correlation is enough.” We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.

And where does Anderson suppose all these statistical algorithms come from?  Think about it.  The statistical algorithms have come from “old science”.  We came up with statistics as a way model things – to compress our data.  If we simply use these algos without ever obtaining understanding and testing models how can you validate that your statistical models/algos are good at finding correlations?  You can’t!  This point alone is enough to dismiss Anderson’s “non theory” (or is it a theory?).  Read on if you want more commentary.

Certainly there are some tidbits of useful insight, however, his call for the end of science as we know it hardly withstands much thought.

a) Google doesn’t know as much as everyone claims

b) Correlation is not enough for understanding.  If all we are going to do after the end of theory is act on correlations of intervening variables (i.e. variables/metaphoros that aren’t at the root of a phenomenon but are associated), we will get further and futher from understanding “the thing”.  That’s ok in business and some technical situations where you want to cut corners (understanding isn’t important) but would be horribly catastrophic in medical procedures, genetic work, rocketry, etc. etc.

c) Models are useful.  In fact, Anderson employs Google as a model to communicate his ideas.  Models aren’t the thing, and most serious thinkers never claim they are.  Models help to organize thinking and direct research, but they do not substitute for the phenomenon.  Yes, in new investigations our models are somewhat off, but it an uncountable set of situations our models are highly accurate, useful and consistently employed.  I leave it as an exercise for the reader to think about the many models of the world we all use every day to great effect.

d) No doubt the computational ability we have at our finger tips will help to uncover things we never saw before.  That’s always been the case with new technology.  The better the technology the further we can see, the smaller we can disect, the more we can crunch…  how is the advance of the computer any different?  It’s not!  Think about it.

e) Exhaustive search efforts (massive data mining) like the ones he sites from Venter and others has been going on for decades.  There’s no big shift in the future.  The more we can computer the bigger datasets we’ll work on and we’ll still see things just out of our computational reach.  This is a proven fact.  The universe has been computing and generating data for a very long time and we are not going to catch it seeing as how we’re PART OF IT.

f) I suspect Anderson woke up one morning to his own realizations about the usefulness of datamining.  Meanwhile the rest of us have been taken advantage of new technologies and ever increasing data storage for a very long time (in fact, pretty much since the inception of science…)

Anderson is a good writer and a bad scientist.  Oh well, life and science and journalism carry on….

Read Full Post »