To find the paper or other interesting ones, click
In cosmology, a few candidate models to be chosen, are generally nested. A larger model usually is with extra terms than smaller ones. How to define the penalty for the extra terms will lead to a different choice of model selection criteria. However, astronomy papers in general never discuss the consistency or statistical optimality of these selection criteria; most likely Monte Carlo simulations and extensive comparison across those criteria. Nonetheless, my personal thought is that the field of model selection should be encouraged to astronomers to prevent fallacies of blindly fitting models which might be irrelevant to the information that the data set contains. Physics tells a correct model but data do the same.
]]>The popularity of AIC comes from its simplicity. By penalizing the log of maximum likelihood with the number of model parameters (p), one can choose the best model that describes/generates the data. Nonetheless, we know that AIC has its shortcoming: all candidate models are nested each other and come from the same parametric family. For an exponential family, the trace of multiplication of score function and Fisher information becomes equivalent to the number of parameters, where you can easily raise a question, “what happens when the trace cannot be obtained analytically?”
The general form of AIC is called TIC (Takeuchi’s information criterion, Takeuchi, 1976), where the penalized term is written as the trace of multiplication of score function and Fisher information. Still, I haven’t answered to the question above.
I personally think that a trick to avoid such dilemma is the key content of Stone (1974), using cross-validation. Stone proved that computing the log likelihood by cross-validation is equivalent to AIC, without computing the score function and Fisher information or getting an exact estimate of the number of parameters. Cross-validation enables to obtain the penalized maximum log likelihoods across models (penalizing is necessary due to estimating the parameters) so that comparison among models for selection becomes feasible while it elevates worries of getting the proper number of parameters (penalization).
Numerous tactics are available for the purpose of model selection. Although variable selection (candidate models are generally nested) is a very hot topic in statistics these days and tones of publication could be found, when it comes to applying resampling methods to model selection, there are not many works. As Stone proved, cross-validation relieves any difficulties of calculating the score function and Fisher information of a model. I was working on non-nested model selection (selecting a best model from different parametric families) with Jackknife with Prof. Babu and Prof. Rao at Penn State until last year (paper hasn’t submitted yet) based on finding that the Jackknife enables to get the unbiased maximum likelihood. Even though high cost of computation compared to cross-validation and the jackknife, the bootstrap has occasionally appeared for model selection.
I’m not sure cross-validation or the jackknife is a feasible approach to be implemented in astronomical softwares, when they compute statistics. Certainly it has advantages when it comes to calculating likelihoods, like Cash statistics.
]]>Off the topic but worth to be notified:
1. They used AIC for model comparison. In spite of many advocates for BIC, choosing AIC would do a better job for analyzing catalog data (399,929 galaxies) since the penalty term in BIC with huge sample will lead to select the model of most parsimony.
2. Despite that more detailed discussion hasn’t been posted, I’d like to point out photometric redshift studies are more or less regression problems. Whether they use sophisticated and up-to-date classification schemes such as support vector machine (SVM), artificial neural network (ANN), or classical regression methods, the goal of the study in photometric redshifts is finding predictors for right classification and the model from those predictors. I wish there will be some studies on quantile regression, which receive many spotlights recently in economics.
3. Adaptive kernels were mentioned and the results of adaptive kernel regression are highly expected.
4. Comparing root mean square errors from various classification and regression models based on Sloan Digital Sky Survey (SDSS) EDR (Early Data Release) to DR5 (Date Release 5) might mislead the conclusion of choosing the best regression/classification method due to different sample sizes in EDR to DR5. Further formulation, especially asymptotic properties of these root mean square errors will be very useful to make a legitimate comparison among different regression/classification strategies.
]]>