Comments on: Significance of 5 counts http://hea-www.harvard.edu/AstroStat/slog/2008/significance-of-5-counts/ Weaving together Astronomy+Statistics+Computer Science+Engineering+Intrumentation, far beyond the growing borders Fri, 01 Jun 2012 18:47:52 +0000 hourly 1 http://wordpress.org/?v=3.4 By: TomLoredo http://hea-www.harvard.edu/AstroStat/slog/2008/significance-of-5-counts/comment-page-1/#comment-212 TomLoredo Thu, 24 Apr 2008 22:15:09 +0000 http://hea-www.harvard.edu/AstroStat/slog/?p=274#comment-212 Hyunsook wrote: <em>Although the given expected background counts is 0.1, the real background counts should be in integer (am I thinking right?) and I believe the law of total probability is needed: P(X>=5|Background)=\sum _{BC=0,…} P(X>=5|B,BC)P(BC).</em> This is along the lines of what a Bayesian calculation does. You can do it either with a known background rate, or accounting for uncertainty; either way, the marginal likelihood for the signal strength ends up being a sum over terms that condition on a certain number of b.g. counts, appropriately weighted. (This is one of the examples I present in my CASt summer school lectures.) I think the "bayes" option for the fitting statistic in Sherpa implements this (based on code I provided Peter Freeman), but I don't know if it's separately exposed for aiding source significance calculations. There is some frequentist work with somewhat of the same flavor (conditioning on the b.g. counts being below the observed total # of counts); Michael Woodroofe did something along these lines. Hyunsook wrote:

Although the given expected background counts is 0.1, the real background counts should be in integer (am I thinking right?) and I believe the law of total probability is needed: P(X>=5|Background)=\sum _{BC=0,…} P(X>=5|B,BC)P(BC).

This is along the lines of what a Bayesian calculation does. You can do it either with a known background rate, or accounting for uncertainty; either way, the marginal likelihood for the signal strength ends up being a sum over terms that condition on a certain number of b.g. counts, appropriately weighted. (This is one of the examples I present in my CASt summer school lectures.) I think the “bayes” option for the fitting statistic in Sherpa implements this (based on code I provided Peter Freeman), but I don’t know if it’s separately exposed for aiding source significance calculations.

There is some frequentist work with somewhat of the same flavor (conditioning on the b.g. counts being below the observed total # of counts); Michael Woodroofe did something along these lines.

]]>
By: vlk http://hea-www.harvard.edu/AstroStat/slog/2008/significance-of-5-counts/comment-page-1/#comment-208 vlk Tue, 22 Apr 2008 22:20:53 +0000 http://hea-www.harvard.edu/AstroStat/slog/?p=274#comment-208 On why take the mean of the p-values. For a given rescaled background (b/50), the p-value is the probability that the counts in the source region can exceed the observed counts. So if you imagine drawing N samples of the background _in the source region_, then a fraction p of them will have counts greater than 5. When you simulate different rescaled backgrounds by drawing from the Poisson distribution p(b|B=6) say NSIM times, each simulation will give a different fraction p_i of counts greater than 5. So, for an overall p-value, you want the number of times that you will see counts in the source region greater than 5, which is simply Sum(p_i*N) in NSIM simulations, which is simply the mean of {p_i*N}, and when expressed as a fraction by dividing out N, is the mean of the p_i. On why take the mean of the p-values. For a given rescaled background (b/50), the p-value is the probability that the counts in the source region can exceed the observed counts. So if you imagine drawing N samples of the background _in the source region_, then a fraction p of them will have counts greater than 5. When you simulate different rescaled backgrounds by drawing from the Poisson distribution p(b|B=6) say NSIM times, each simulation will give a different fraction p_i of counts greater than 5. So, for an overall p-value, you want the number of times that you will see counts in the source region greater than 5, which is simply Sum(p_i*N) in NSIM simulations, which is simply the mean of {p_i*N}, and when expressed as a fraction by dividing out N, is the mean of the p_i.

]]>
By: aneta http://hea-www.harvard.edu/AstroStat/slog/2008/significance-of-5-counts/comment-page-1/#comment-202 aneta Sat, 19 Apr 2008 01:33:56 +0000 http://hea-www.harvard.edu/AstroStat/slog/?p=274#comment-202 o.k. I simulated background counts given the observed 6 background counts in the background area and calculated the predicted number of counts in the source area. I did 1000 realizations assuming Poisson distribution for the background. Then I calculated the significance of 5 detected counts given each of the simulated background realizations - so I got 1000 p-values. What should I do with the p-values now? After talking to Vinay we decided that the thing to do is to calculate the mean of p-values. The mean results in a little higher than the original significance. I get 6e-7 instead of 8e-8 for the significance of 5 counts detected in my observations. I run the simulations in Python with numpy.random.poisson() to get the simulated background counts and use the incomplete gamma function to calculate the significance for each simulation. I also plotted the distribution of my p-values... Another trick here in addition to the background fluctuations is to consider also the prior knowledge about the source being at the location of the detected photons. How do we include this prior information in the calculated p-values? o.k. I simulated background counts given the observed 6 background counts in the background area and calculated the predicted number of counts in the source area. I did 1000 realizations assuming Poisson distribution for the background. Then I calculated the significance of 5 detected counts given each of the simulated background realizations – so I got 1000 p-values. What should I do with the p-values now? After talking to Vinay we decided that the thing to do is to calculate the mean of p-values. The mean results in a little higher than the original significance. I get 6e-7 instead of 8e-8 for the significance of 5 counts detected in my observations.

I run the simulations in Python with numpy.random.poisson() to get the simulated background counts and use the incomplete gamma function to calculate the significance for each simulation. I also plotted the distribution of my p-values…

Another trick here in addition to the background fluctuations is to consider also the prior knowledge about the source being at the location of the detected photons. How do we include
this prior information in the calculated p-values?

]]>
By: vlk http://hea-www.harvard.edu/AstroStat/slog/2008/significance-of-5-counts/comment-page-1/#comment-199 vlk Fri, 18 Apr 2008 03:43:14 +0000 http://hea-www.harvard.edu/AstroStat/slog/?p=274#comment-199 Ah, but Aneta, what about the variation in the background? How do you specify BACKSCAL? If you determine background by collecting 1 count in bkgarea=10*srcarea, 5 observed counts is only about 4sigma. Significant, but not that highly! (Rises to 5sigma for 10 counts in 100*srcarea, etc.) PS: I used a PINTofALE IDL function called detect_limit() ( http://hea-www.harvard.edu/PINTofALE/doc/PoA.html#DETECT_LIMIT ) to get those numbers. Ah, but Aneta, what about the variation in the background? How do you specify BACKSCAL? If you determine background by collecting 1 count in bkgarea=10*srcarea, 5 observed counts is only about 4sigma. Significant, but not that highly! (Rises to 5sigma for 10 counts in 100*srcarea, etc.)

PS: I used a PINTofALE IDL function called detect_limit() ( http://hea-www.harvard.edu/PINTofALE/doc/PoA.html#DETECT_LIMIT ) to get those numbers.

]]>
By: hlee http://hea-www.harvard.edu/AstroStat/slog/2008/significance-of-5-counts/comment-page-1/#comment-198 hlee Fri, 18 Apr 2008 03:39:29 +0000 http://hea-www.harvard.edu/AstroStat/slog/?p=274#comment-198 Although the given expected background counts is 0.1, the real background counts should be in integer (am I thinking right?) and I believe the law of total probability is needed: P(X>=5|Background)=\sum _{BC=0,...} P(X>=5|B,BC)P(BC). I wonder if the spfunc.gamma computes this (sorry for being phthon illiterate). I believe more fine tuning in computing this p-value is necessary but I wish to address that computing the p-value assuming Ho: background counts=0.1 vs. H1: background counts\ne 0.1 makes me uncomfortable when the observation is 5 counts, whereas the background counts are integers with very low expected background counts. I presume in either ways, 5 counts is opt to be said significant. Although the given expected background counts is 0.1, the real background counts should be in integer (am I thinking right?) and I believe the law of total probability is needed: P(X>=5|Background)=\sum _{BC=0,…} P(X>=5|B,BC)P(BC). I wonder if the spfunc.gamma computes this (sorry for being phthon illiterate). I believe more fine tuning in computing this p-value is necessary but I wish to address that computing the p-value assuming Ho: background counts=0.1 vs. H1: background counts\ne 0.1 makes me uncomfortable when the observation is 5 counts, whereas the background counts are integers with very low expected background counts. I presume in either ways, 5 counts is opt to be said significant.

]]>