Apart from the technical details, the first two sentences from the conclusion,
We have developed computational approaches for signal reconstruction from photon-limited measurements – a situation prevalent in many practical settings. Our method optimizes a regularized Poisson likelihood under nonnegativity constraints
tempt me to study and try their algorithm.
]]>The figure below (AAS 472.09) compares the pdfs for the Poisson intensity (red curves) and the Gaussian equivalent (black curves) for two cases: when the number of counts in the source region is 50 (top) and 8 (bottom) respectively. In both cases a background of 200 counts collected in an area 40x the source area is used. The hatched region represents the 68% equal-tailed interval for the Poisson case, and the solid horizontal line is the ±1σ width of the equivalent Gaussian.
Clearly, for small counts, the support of the Poisson distribution is bounded below at zero, but that of the Gaussian is not. This introduces a visibly large bias in the interval coverage as well as in the normalization properties. Even at high counts, the Poisson is skewed such that larger values are slightly more likely to occur by chance than in the Gaussian case. This skew can be quite critical for marginal results.
Poisson and Gaussian probability densities
No simple IDL code this time; but for reference, the Poisson posterior probability density curves were generated with the PINTofALE routine ppd_src()
]]>I ran a simple Monte Carlo based test to compute the expected bias between a Poisson sample and the “equivalent” Gaussian sample. The result is shown in the plot below.
The jagged red line is the fractional expected bias relative to the true intensity. The typical recommendation in high-energy astronomy is to bin up events until there are about 25 or so counts per bin. This leads to an average bias of about 2% in the estimate of the true intensity. The bias drops below 1% for counts >50. The smooth blue line is the reciprocal of the square-root of the intensity, reflecting the width of the Poisson distribution relative to the true intensity, and is given here only for illustrative purposes.
Poisson-Gaussian bias
Exemplar IDL code that can be used to generate this kind of plot is appended below:
nlam=100L & nsim=20000L
lam=indgen(nlam)+1 & sct=intarr(nlam,nsim) & scg=sct & dct=fltarr(nlam)
for i=0L,nlam-1L do sct[i,*]=randomu(seed,nsim,poisson=lam[i])
for i=0L,nlam-1L do scg[i,*]=randomn(seed,nsim)*sqrt(lam[i])+lam[i]
for i=0,nlam-1L do dct[i]=mean(sct[i,*]-scg[i,*])/(lam[i])
plot,lam,dct,/yl,yticklen=1,ygrid=1
oplot,lam,1./sqrt(lam)
Suppose N counts are randomly placed in an interval of duration τ without any preference for appearing in any particular portion of τ. i.e., the distribution is uniform. The counting rate R = N/τ. We can now ask, what is the probability of finding k counts in an infinitesimal interval δt within τ?
First, consider the probability that one count, placed randomly, will fall inside δt,
ρ = δt/τ ≡ Rδt/N ≡ ν/N
where ν = R δt represents the expected counts intensity in the interval δt. When N counts are scattered over τ, the probability that k of them will fall inside δt is described with a binomial distribution,
p(k|ρ,N) = NCk ρk (1-ρ)N-k
as the product of the probability of finding k events inside δt and the probability of finding the remaining events outside, summed over all the possible distinct ways that k events can be chosen out of N. Expanding the expression and rearranging,
= N!/{(N-k)!k!} (R δt/N)k (1-(R δt/N))N-k
= N!/{(N-k)!k!} (νk/Nk) (1-(ν/N))N-k
= N!/{(N-k)!Nk} (νk/k!) (1-(ν/N))N (1-(ν/N))-k
Note that as N,τ —> ∞ (while keeping R fixed),
N!/{(N-k)!Nk} , (1-(ν/N))-k —> 1
(1-(ν/N))N —> e-ν
and the expression reduces to
p(k|ν) = (νk/k!) e-ν
which is the familiar (in a manner of speaking) expression for the Poisson likelihood.
[Comment] You must read it. It can serve as a very good Bayesian tutorial for astronomers. I think there’s a typo, nothing major, plus/minus sign in the likelihood, though. Tom Loredo kindly has informed through his extensive slog comments about Schechter function and this paper made me appreciate the gamma distribution more. Schechter function and the gamma density function share the same equation although the objective of their use does not have much to be shared (Forgive my Bayesian ignorance in the extensive usage of gamma distribution except the fact it’s a conjugate of Poisson or exponential distribution).
FYI, there was another recent arxiv paper on zero-inflation [stat.ME:0805.2258] by Bhattacharya, Clarke, & Datta
A Bayesian test for excess zeros in a zero-inflated power series distribution
The gamma function is defined with two parameters, alpha, and beta, over the +ve non-negative real line. alpha can be any real number greater than 1 unlike the Poisson likelihood where the equivalent quantity are integers (values less than 1 are possible, but the function ceases to be integrable) and beta is any number greater than 0.
The mean is alpha/beta and the variance is alpha/beta2. Conversely, given a sample whose mean and variance are known, one can estimate alpha and beta to describe that sample with this function.
This is reminiscent of the Poisson distribution where alpha ~ number of counts and beta is akin to the collecting area or the exposure time. For this reason, a popular non-informative prior to use with the Poisson likelihood is gamma(alpha=1,beta=0), which is like saying “we expect to detect 0 counts in 0 time”. (Which, btw, is not the same as saying we detect 0 counts in an observation.) [Edit: see Tom Loredo's comments below for more on this.] Surprisingly, you can get less informative that even that, but that’s a discussion for another time.
Because it is the conjugate prior to the Poisson, it is also a useful choice to use as an informative prior. It makes derivations of formulae that much easier, though one has to be careful about using it blindly in real world applications, as the presence of background can muck up the pristine Poissonness of the prior (as we discovered while applying BEHR to Chandra Level3 products).
]]>J.J.Spinelli and M.A.Stephens (1997)
Cramer-von Mises tests of fit for the Poisson distribution
Canadian J. Stat. Vol. 25(2), pp. 257-267
Abstract: goodness-of-fit tests based on the Cramer-von Mises statistics are given for the Poisson distribution. Power comparisons show that these statistics, particularly A2, give good overall tests of fit. The statistics A2 will be particularly useful for detecting distributions where the variance is close to the mean, but which are not Poisson.
In addition to Cramer-von Mises statistics (A2 and W2), the dispersion test D (so called a χ2 statistic for testing the goodness of fit in astronomy and this D statistics is considered as a two sided test approximately distributed as a χ2n-1 variable), the Neyman-Barton k-component smooth test Sk, P and T (statistics based on the probability generating function), and the Pearson X2 statistics (the number of cells K is chosen to avoid small expected values and the statistics is compared to a χ2K-1 variable, I think astronomers call it modified χ2 test) are introduced and compared to compute the powers of these tests. The strategy to provide the powers of the Cramer-von Mises statistics is that there is a parameter γ in the negative binomial distribution, which is zero under the null hypothesis (Poission distribution), and letting this γ=δ/sqrt(n) in which the parameter value δ is chosen so that for a two-sided 0.05 level test, the best test has a power of 0.5[1]. Based on this simulation study, the statistic A2 was empirically as powerful as the best test compared to other Cramer-von Mises tests.
Under the Poission distribution null hypothesis, the alternatives are overdispersed, underdispersed, and equally dispersed distributions. For the equally dispersed alternative, the Cramer-von Mises statistics have the best power compared other statistics. Overall, the Cramer-von Mises statistics have good power against all classes of alternative distributions and the Pearson X2 statistic performed very poorly for the overdispersed alternative.
Instead of binning for the modified χ2 tests[2], we could adopt A2 of W2 for the goodness-of-fit tests. Probably, it’s already implemented in softwares but not been recognized.
This study could be linked to identifying the number of lines from Poisson nature x-ray count data, one of the key interests for astronomers. However, as pointed by the authors, estimating the numbers of classes is a difficult statistical problem. I.J.Good[1] said that
I don’t believe it is usually possible to estimate the number of species, but only an appropriate lower bound to that number. This is because there is nearly always a good chance that there are a very large number of extremely rare species.
The authors have been working on the Poisson mixture models on genetic data. I wonder if anything could be extracted for astronomical applications. The Poisson mixture models also explain coverage problems, beyond line identification. Without mathematical equations, summarizing the body of the paper seems impossible so that only their abstract is added.
Abstract:
Estimating the unknown number of classes in a population has numerous important applications. In a Poisson mixture model, the problem is reduced to estimating the odds that a class is undetected in a sample. The discontinuity of the odds prevents the existence of locally unbiased and informative estimators and restricts confidence intervals to be one-sided. Confidence intervals for the number of classes are also necessarily one-sided. A sequence of lower bounds to the odds is developed and used to define pseudo maximum likelihood estimators for the number of classes.
Abstract summary:
The authors investigated issues in interval estimation of the mean in the exponential family, such as binomial, Poisson, negative binomial, normal, gamma, and a sixth distribution. The poor performance of the Wald interval has been known not only for discrete cases but for nonnormal continuous cases with significant negative bias. Their computation suggested that the equal tailed Jeffreys interval and the likelihood ratio interval are the best alternatives to the Wald interval.
Brief summary of the paper without equations:
The objective of this paper is interval estimation of the mean in the natural exponential family (NEF) with quadratic variance functions (QVF) and the particular focus has given to discrete NEF-QVF families consisting of the binomial, negative binomial, and the Poission distributions. It is well known that the Wald interval for a binomial proportion suffers from a systematic negative bias and oscillation in its coverage probability even for large n and p near 0.5, which seems to arise from the lattice nature and the skewness of the binomial distribution. They exemplified this systematic bias and oscillation with Poisson cases to illustrate the poor and erratic behavior of the Wald interval in lattice problems. They proved the bias expressions of the three discrete NEF-QVF distributions and added a disconcerting graphical illustration of this negative bias.
Interested readers should check the figure 4, where the performances of the Wald, score, likelihood ratio (LR), and Jeffreys intervals were compared. Also, the figure 5 illustrated the limits of those four intervals: LR and Jeffreys’ intervals were indistinguishable. They derived the coverage probabilities of four intervals via Edgeworth expansions. The nonoscillating O(n^-1) terms from the Edgeworth expansions were studied to compare the coverage properties of these four intervals. The figure 6 shows that the Wald interval has serious negative bias, whereas the nonoscillating term in the score interval is positive for all three, binomial, negative binomial, and Poission distributions. The negative bias of the Wald interval is also found from continuous distributions like normal, gamma, and NEF-GHS distributions (Figure 7).
As a conclusion, they reconfirmed their findings like LR and Jeffreys intervals are the best alternative to the Wald interval in terms of the negative bias in the coverage and the length. The Rao score interval has a merit of easy presentations but its performance is inferior to LR and Jeffreys’ intervals although it is better than the Wald interval. Yet, the authors left a room for users that choosing one of these intervals is a personal choice.
[Addendum] I wonder if statistical properties of Gehrels’ confidence limits have been studied after the publication. I’ll try to post findings about the statistics of the Gehrels’ confidence limits, shortly(hopefully).
]]>