The AstroStat Slog » likelihood ratio test http://hea-www.harvard.edu/AstroStat/slog Weaving together Astronomy+Statistics+Computer Science+Engineering+Intrumentation, far beyond the growing borders Fri, 09 Sep 2011 17:05:33 +0000 en-US hourly 1 http://wordpress.org/?v=3.4 Likelihood Ratio Technique http://hea-www.harvard.edu/AstroStat/slog/2009/likelihood-ratio-technique/ http://hea-www.harvard.edu/AstroStat/slog/2009/likelihood-ratio-technique/#comments Thu, 15 Jan 2009 22:01:28 +0000 hlee http://hea-www.harvard.edu/AstroStat/slog/?p=1433 I wonder what Fisher, Neyman, and Pearson would say if they see “Technique” after “Likelihood Ratio” instead of “Test.” A presenter’s saying “Likelihood Ratio Technique” for source identification, I couldn’t resist checking it out not to offend founding fathers of the likelihood principle in statistics since “Technique” sounded derogatory to be attached with “Likelihood” to my ears. I thank, above all, the speaker who kindly gave me the reference about this likelihood ratio technique.

On the likelihood ratio for source identification by Sutherland and Saunders (1992) in MNRAS vol. 259, pp. 413-420.

Their computed likelihood ratio (L) correspond to Bayes factor by the form (P(source model)/P(background model)). Considering the fact that it’s binary, source or background, L shares the form of a hazard ratio (L=p(source)/p(not source)=p(source)/(1-p(source)). Since the likelihood can be based on probability density function, the authors defined “Likelihood ratio” literally by taking the ratio of two likelihood functions. Not taking the statistical direction as in the likelihood ratio test and the Neyman-Pearson lemma, naming their method as “likelihood ratio technique” seems proper, and it’s not derogatory any more. The focus of the paper is that estimating the probability density functions of backgrounds and sources more or less empirically without concerns toward general statistical inference. Hitherto, the large Bayes factor, large L (likelihood ratio) of a source, or large posterior probability of a source (p(genuine|m,c,x,y)=L/(1+L)) is just an indicator that the given source is more likely a real source.

In summary, the likelihoods of source and of background (of numerator and of denominator) are empirically obtained based on physics which turned out to have matching parametric distributions well discussed in statistics. What is different from statistics is that the likelihood ratio didn’t lead to testing hypothesis based on Neyman-Pearson Lemma. Computing the likelihood ratio is utilized as an indicator of a source. Well, often times, it’s hard to judge the real content of an astronomical study by its name, title, or abstract due to my statistically oriented stereotypes.

]]>
http://hea-www.harvard.edu/AstroStat/slog/2009/likelihood-ratio-technique/feed/ 0
Likelihood Ratio Test Statistic [Equation of the Week] http://hea-www.harvard.edu/AstroStat/slog/2008/eotw-lrt-statistic/ http://hea-www.harvard.edu/AstroStat/slog/2008/eotw-lrt-statistic/#comments Wed, 18 Jun 2008 17:00:30 +0000 vlk http://hea-www.harvard.edu/AstroStat/slog/?p=319 From Protassov et al. (2002, ApJ, 571, 545), here is a formal expression for the Likelihood Ratio Test Statistic,

TLRT = -2 ln R(D,Θ0,Θ)

R(D,Θ0,Θ) = [ supθεΘ0 p(D|Θ0) ] / [ supθεΘ p(D|Θ) ]

where D are an independent data sample, Θ are model parameters {θi, i=1,..M,M+1,..N}, and Θ0 form a subset of the model where θi = θi0, i=1..M are held fixed at their nominal values. That is, Θ represents the full model and Θ0 represents the simpler model, which is a subset of Θ. R(D,Θ0,Θ) is the ratio of the maximal (technically, supremal) likelihoods of the simpler model to that of the full model.

When standard regularity conditions hold — the likelihoods p(D|Θ) and p(D|Θ0) are thrice differentiable; Θ0 is wholly contained within Θ, i.e., the nominal values {θi0, i=1..M} are not at the boundary of the allowed values of {θi}; and the allowed range of D are not dependent on the specific values of {θi} — then the LRT statistic is distributed as a χ2-distribution with the same number of degrees of freedom as the difference in the number of free parameters between Θ and Θ0. These are important conditions, which are not met in some very common astrophysical problems (e.g, one cannot use it to test the significance of the existence of an emission line in a spectrum). In such cases, the distribution of TLRT must be calibrated via Monte Carlo simulations for that particular problem before using it as a test for the significance of the extra model parameters.

Of course, an LRT statistic is not obliged to have exactly this form. When it doesn’t, even if the regularity conditions hold, it will not be distributed as a χ2-distribution, and must be calibrated, either via simulations, or analytically if possible. One example of such a statistic is the F-test (popularized among astronomers by Bevington). The F-test uses the ratio of the difference in the best-fit χ2 to the reduced χ2 of the full model, F=Δχ22ν, as the statistic of choice. Note that the numerator by itself constitutes a valid LRT statistic for Gaussian data. This is distributed as the F-distribution, which results when a ratio is taken of two quantities each distributed as the χ2. Thus, all the usual regularity conditions must hold for it to be applicable, as well as that the data must be in the Gaussian regime.

]]>
http://hea-www.harvard.edu/AstroStat/slog/2008/eotw-lrt-statistic/feed/ 2
[ArXiv] Identifiability and mixtures of distributions, Aug. 3, 2007 http://hea-www.harvard.edu/AstroStat/slog/2007/arxiv-identifiability-and-mixtures/ http://hea-www.harvard.edu/AstroStat/slog/2007/arxiv-identifiability-and-mixtures/#comments Fri, 07 Sep 2007 06:02:58 +0000 hlee http://hea-www.harvard.edu/AstroStat/slog/2007/arxiv-identifiability-and-mixtures/ From arxiv/math.st: 0708.0499v1
Inference for mixtures of symmetric distributions by Hunter, Wang, and Hettmansperger, Annals of Statistics, 2007, Vol.35(1), pp.224-251.

Consider a case of fitting a spectral line in addition to continuum with a delta function or a gaussian (normal) density function. Among many regularity conditions, personally the most bothersome one is identifiability. When the scale parameter (σ) goes to zero, we cannot tell which model, delta, or gaussian, is a right one. Furthermore, the likelihood ratio test cannot be applied to the delta function due to its discontinuity. For a classical confidence interval or a hypothesis test which astronomers are familiar with from Numerical Recipes, identifiability and the set property (topology) of model parameters suffer from the lack of attentions from astronomers who performs statistical inference on model parameters. I found a few astronomical papers that ignored this identifiability but used the likelihood ratio tests for an extra component discovery. Clearly, these are statistical malpractices.

Although math.st:0708.0499 did not discuss spectral line fitting, it offers a nice review on identifiability when inferencing for mixtures of symmetric distributions.

]]>
http://hea-www.harvard.edu/AstroStat/slog/2007/arxiv-identifiability-and-mixtures/feed/ 4