To answer your questions, leaving one datum out and having a series of MLEs is correct. One computes the likelihood on each datum with the MLE obtained without that datum, then add the all likelihoods computed with one datum will be the likelihood computed via cross-validation (CV). Next, one choose one model among candidates based on likelihoods of different models (if CV is chosen), or information criteria (many of which are maximum likelihoods + penalty). Once the model is chosen, the we can move to the inference step; however, this needs some care.
To prevent a little confusion, estimating parameters (getting MLEs) is a by-product to get the maximum likelihoods when the model selection is the goal of the study. Andrew Liddle and his colleagues have been writing papers on model selection applied to cosmology. Their papers may help to understand how statistical model selection is applied to astronomy, although their model selection methods are limited to BIC and DIC. I had a feeling that Protossov et al (2001) just scratched the surface of model selection and didn’t let people to taste the fruit. Yet, it’s a good reference because of its appendix, at least.
]]>LOO(Leave One Out) is an expression that Prof. Rao often said. For a maximum likelihood calculation, leave one observation out, compute the maximum likelihood (ML) and the ML estimator (MLE) with the rest. With the one left observation and the MLE, a likelihood of that observation is obtained and repeat this process for all observations. Asymptotically, calculating the likelihood by LOO is equivalent to AIC.
Instead of “score function,” I’d rather use J function but this single letter gives more ambiguity. Here, the score function means the expectation of the first derivative of the log likelihood at the true parameter. Fisher information involves the 2nd order derivation and there are cases that the analytic forms of such derivations are not available, where cross validation could replace AIC or TIC.
One drawback would be computation time O(n) if AIC is O(1). For binned/clipped data, this increment could be nothing but if we happened to keep all 1078 channels and adopting a complicated model for MLEs, we’d better not to use resampling methods without smart optimization tools.
]]>