The AstroStat Slog » upper limit

Kaplan-Meier Estimator (Equation of the Week)

vlk — Wed, 09 Jul 2008 17:00:54 +0000

The Kaplan-Meier (K-M) estimator is the non-parametric maximum likelihood estimator of the survival probability of items in a sample. “Survival” here is a historical holdover because this method was first developed to estimate patient survival chances in medicine, but in general it can be thought of as a form of cumulative probability. It is of great importance in astronomy because so much of our data are limited and this estimator provides an excellent way to estimate the fraction of objects that may be below (or above) certain flux levels. The application of K-M to astronomy was explored in depth in the mid-80′s by Jurgen Schmitt (1985, ApJ, 293, 178), Feigelson & Nelson (1985, ApJ 293, 192), and Isobe, Feigelson, & Nelson (1986, ApJ 306, 490). [See also Hyunsook's primer.] It has been coded up and is available for use as part of the ASURV package.

Consider a simple case where you have N observations of the luminosities of a source. Let us say that all N sources have been detected and their luminosities are estimated to be L_i, i=1..N, and that they are ordered such that L_i < L_i+1 Then, it is easy to see that the fraction of sources above each L_i can be written as the sequence

{ N-1, N-2, N-3, … 2, 1, 0}/N

The K-M estimator is a generalized form that describes this sequence, and is written as a product. The probability that an object in the sample has luminosity greater than L_k is

S(L>L₁) = (N-1)/N
S(L>L₂) = (N-1)/N * ((N-1)-1)/(N-1) = (N-1)/N * (N-2)/(N-1) = (N-2)/N
S(L>L₃) = (N-1)/N * ((N-1)-1)/(N-1) * ((N-2)-1)/(N-2) = (N-3)/N
…
S(L>L_k) = Π_i=1..k (n_i-1)/n_i = (N-k)/N

where n_k are the number of objects still remaining at luminosity level L ≥ L_k, and at each stage one object is decremented to account for the drop in the sample size.

Now that was for the case when all the objects are detected. But now suppose some are not, and only upper limits to their luminosities are available. A specific value of L cannot be assigned to these objects, and the only thing we can say is that they will “drop out” of the set at some stage. In other words, the sample will be “censored”. The K-M estimator is easily altered to account for this, by changing the decrement in each term of the product to include the censored points. Thus, the general K-M estimator is

S(L>L_k) = Π_i=1..k (n_i-c_i)/n_i

where c_i are the number of objects that drop out between L_i-1 and L_i.

Note that the K-M estimator is a maximum likelihood estimator of the cumulative probability (actually one minus the cumulative probability as it is usually understood), and uncertainties on it must be estimated via Monte Carlo or bootstrap techniques [or not.. see below].

]]>

[ArXiv] Bayesian Star Formation Study, July 13, 2007

hlee — Mon, 16 Jul 2007 19:31:13 +0000

From arxiv/astro-ph:0707.2064v1
Star Formation via the Little Guy: A Bayesian Study of Ultracool Dwarf Imaging Surveys for Companions by P. R. Allen.

I rather skip all technical details on ultracool dwarfs and binary stars, reviews on star formation studies, like initial mass function (IMF), astronomical survey studies, which Allen gave a fair explanation in arxiv/astro-ph:0707.2064v1 but want to emphasize that based on simple Bayes’ rule and careful set-ups for likelihoods and priors according to data (ultracool dwarfs), quite informative conclusions were drawn:

the peak at q~1 is significant,
lack of companions with a distance greater than 15-20 A.U. (a unit indicates the distance between the Sun and the Earth),
less binary stars with later spectral types,
inconsistency of undetected low mass ratio systems to the current data, and
30% spectroscopic binaries are from ultracool binaries.

Before, asking for observational efforts for improvements, it is commented 75% as the the upper limit of the ultracool binary population.

]]>

[ArXiv] Classical confidence intervals, June 25, 2007

hlee — Wed, 27 Jun 2007 18:23:02 +0000

From arXiv:physics.data-an/0706.3622v1:
Comments on the unified approach to the construction of classical confidence intervals

This paper comments on classical confidence intervals and upper limits, as the so-called a flip-flopping problem, both of which are related asymptotically (when n is large enough) by the definition but cannot be converted from one to the another by preserving the same coverage due to the poisson nature of the data.

I’ve heard a few discussions about classical confidence intervals and upper limits from particle physicists and theoretical statisticians. Nonetheless, not being in the business from the beginning (1. the time point of particle physicists aware of statistics to obtain coverages and upper limits or 2. Neyman’s publication (1937) Phil. Trans. Royal Soc. London A, 236, p.333) makes it hard to grasp the essence of this flip-flopping problem. On the other hand, I could sense that lots of statistical challenges (for both classical and bayesian statisticians) residing in this flip-flopping problem and wish for some tutorials or chronological reviews on the subject.

]]>