The AstroStat Slog » George Box http://hea-www.harvard.edu/AstroStat/slog Weaving together Astronomy+Statistics+Computer Science+Engineering+Intrumentation, far beyond the growing borders Fri, 09 Sep 2011 17:05:33 +0000 en-US hourly 1 http://wordpress.org/?v=3.4 All models are wrong, but some are useful http://hea-www.harvard.edu/AstroStat/slog/2008/useful-wrong-model/ http://hea-www.harvard.edu/AstroStat/slog/2008/useful-wrong-model/#comments Tue, 01 Jul 2008 03:12:23 +0000 hlee http://hea-www.harvard.edu/AstroStat/slog/?p=346

All models are wrong, but some are useful. –George Box


One of the most frequently cited quotes appeared in an article, titled The End of Theory: The Data Deluge Makes the Scientific Method Obsolete which I liked it very much because it cited the updated maxim by Peter Norvig, Google’s research director,

All models are wrong, and increasingly you can succeed without them.

The article addressed perspectives of the new Petabyte data analysis era, where the traditional modeling and testing are not likely feasible.

I’d like to thank the person who forwarded this article. However, I have no intention of advertising the company in the article by your click and reading. At least, I’d like to urge that we need more innovative thinkings than what we normally do with small data sets described by the author, Chris Anderson:

The scientific method is built around testable hypotheses. These models, for the most part, are systems visualized in the minds of scientists. The models are then tested, and experiments confirm or falsify theoretical models of how the world works. This is the way science has worked for hundreds of years.

Scientists are trained to recognize that correlation is not causation, that no conclusions should be drawn simply on the basis of correlation between X and Y (it could just be a coincidence). Instead, you must understand the underlying mechanisms that connect the two. Once you have a model, you can connect the data sets with confidence. Data without a model is just noise.

I cannot put it in an elegant fashion but simply, the data analysis should be directed by listening data and letting data talk to you, instead of framing models onto data (particularly when the data set is large or humongous; good a priori knowledge might be an exception but we never had enough where disputes of errors come in).

Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all.

]]>
http://hea-www.harvard.edu/AstroStat/slog/2008/useful-wrong-model/feed/ 3