[ArXiv] Cross Validation

hlee — Wed, 12 Aug 2009 23:03:43 +0000

Statistical Resampling Methods are rather unfamiliar among astronomers. Bootstrapping can be an exception but I felt like it’s still unrepresented. Seeing an recent review paper on cross validation from [arXiv] which describes basic notions in theoretical statistics, I couldn’t resist mentioning it here. Cross validation has been used in various statistical fields such as classification, density estimation, model selection, regression, to name a few.

[arXiv:math.ST:0907.4728]
A survey of cross validation procedures for model selection by Sylvain Arlot

Nonetheless, I’ll not review the paper itself except some quotes:

-CV is a popular strategy for model selection, and algorithm selection.
-Compared to the resubstitution error, CV avoids overfitting because the training sample is independent from the validation sample.
-A noticed in the early 30s by Larson (1931), training an algorithm and evaluating its statistical performance on the same data yields an overoptimistic results.

There are books on statistical resampling methods covering more general topics, not limited to model selection. Instead, I decide to do a little search how CV is used in astronomy. These are the ADS search results. More publications than I expected.

Kernel regression for determining photometric redshifts from Sloan broad-band photometry [arXiv:0706.2704]
Wang, D.; Zhang, Y. X.; Liu, C.; Zhao, Y. H.
Monthly Notices of the Royal Astronomical Society, Volume 382, Issue 4, pp. 1601-1606 (2007)
STECKMAP: STEllar Content and Kinematics from high resolution galactic spectra via Maximum A Posteriori [arXiv:0507002]
Ocvirk, P.; Pichon, C.; Lançon, A.; Thiébaut, E.
Monthly Notices of the Royal Astronomical Society, Volume 365, Issue 1, pp. 74-84 (2006)
STECMAP: STEllar Content from high-resolution galactic spectra via Maximum A Posteriori [arXiv:0505209]
Ocvirk, P.; Pichon, C.; Lançon, A.; Thiébaut, E.
Monthly Notices of the Royal Astronomical Society, Volume 365, Issue 1, pp. 46-73 (2006)
Automated Detection of Classical Novae with Neural Networks [arXiv:0604236]
Feeney, S. M et al.
The Astronomical Journal, Volume 130, Issue 1, pp. 84-94 (2005)
Estimation of regularization parameters in multiple-image deblurring [arxiv:0405545]
Vio, R.et al.
Astronomy and Astrophysics, v.423, p.1179-1186 (2004)
Machine learning and image analysis for morphological galaxy classification
de la Calleja, Jorge and Fuentes, Olac
Monthly Notices of the Royal Astronomical Society, Volume 349, Issue 4, pp. 87-93 (2004)
Ensembles of Classifiers for Morphological Galaxy Classification
Bazell, D.; Aha, David W.
The Astrophysical Journal, Volume 548, Issue 1, pp. 219-223.(2001)
Bayesian image reconstruction with space-variant noise suppression
Nunez, J.; Llacer, J.
Astronomy and Astrophysics Supplement, v.131, p.167-180 (1998)
Estimating the sun’s rotation from solar oscillations by regularisation
Thompson, A. M.
Astronomy and Astrophysics (ISSN 0004-6361), vol. 265, no. 1, p. 289-295. (1992)

One can easily grasp that many adopted CV under the machine learning context. The application of CV, and bootstrapping is not limited to machine learning. As Arlot’s title, CV is used for model selection. When it come to model selection in high energy astrophysics, not CV but reduced chi^2 measures and fitted curve eye balling are the standard procedure. Hopefully, a renovated model selection procedure via CV or other statistically robust strategy soon challenge the reduced chi^2 and eye balling. On the other hand, I doubt that it’ll come soon. Remember, eyes are the best classifier so it won’t be a easy task.

]]>

[MADS] HMM

hlee — Mon, 08 Dec 2008 03:23:11 +0000

MADS stands for “Missing in ADS.” Every astronomer, I believe, knows what ADS is. As we have [EotW] series and used to have [ArXiv] series, creating a new series for semi-periodic postings under the well known name ADS seems interesting.

I’m not sure about these days, but when I was studying astronomy a decade ago, ADS was Google in astronomy. Once switching to statistics, I was shocked at the fact that there was no composite search engine for statistical literature and databases. I showed ADS to fellow statistics students how good this is at that time and compared ADS with what are available in statistics: JSTOR only had 5 year and older materials. Citeseer was not born nor Project Euclid. Google scholar was not thinkable at all. I used to dig the library cd-roms to satisfy my craving for more information. Now those days are over thanks to Google and other scientific search engines. Yet, astronomers prefer ADS than any other database and search engines because of its comprehensiveness.

Let’s stop praising ADS here and focus on [MADS]. The key of [MADS] is to introduce something common and popular in other fields that does not appear in ADS. Believe it or not, sometimes I encounter missing elements, most likely jargon of other fields, from this giant and old (mature) data system. For example, HMM is one although more will come in the series. HMM stands for Hidden Markov Model. When you put “Hidden Markov Model” as keywords in your search among referred astronomical journals^[1], you’ll see no result within astronomical publications.

Then, what is Hidden Markov Model? I’d rather defer my answer to wiki:Hidden Markov Model, references therein, and image/signal processing text books (I learned the term from a undergraduate text book about a decade ago. So HMM must be a very common and well received methodology). Since astronomers handle images and signals so often, I thought HMM might be a useful tool for modeling and analyzing astronomical data some years back. Unfortunately, it hasn’t emerged yet.

Finding a MADS does not provide me an eureka moment. It only makes me wish that this MADS appears soon in ADS. One of you soon will be the first person who adopts HMM in your research and will be cited as a pioneer within the astronomy community.

Well, against all this hope, I might be forced to drop this post if someone finds out HMM is already described in published astronomy papers while he/she teaches me how to search ADS better in secret.

Otherwise, ADS search all arxiv papers, which include all computer science, math, statistics, physics, and more

]]>

The AstroStat Slog » ADS

[ArXiv] Cross Validation

[MADS] HMM