Consider a simple case where you have N observations of the luminosities of a source. Let us say that all N sources have been detected and their luminosities are estimated to be Li, i=1..N, and that they are ordered such that Li < Li+1 Then, it is easy to see that the fraction of sources above each Li can be written as the sequence
{ N-1, N-2, N-3, … 2, 1, 0}/N
The K-M estimator is a generalized form that describes this sequence, and is written as a product. The probability that an object in the sample has luminosity greater than Lk is
S(L>L1) = (N-1)/N
S(L>L2) = (N-1)/N * ((N-1)-1)/(N-1) = (N-1)/N * (N-2)/(N-1) = (N-2)/N
S(L>L3) = (N-1)/N * ((N-1)-1)/(N-1) * ((N-2)-1)/(N-2) = (N-3)/N
…
S(L>Lk) = Πi=1..k (ni-1)/ni = (N-k)/N
where nk are the number of objects still remaining at luminosity level L ≥ Lk, and at each stage one object is decremented to account for the drop in the sample size.
Now that was for the case when all the objects are detected. But now suppose some are not, and only upper limits to their luminosities are available. A specific value of L cannot be assigned to these objects, and the only thing we can say is that they will “drop out” of the set at some stage. In other words, the sample will be “censored”. The K-M estimator is easily altered to account for this, by changing the decrement in each term of the product to include the censored points. Thus, the general K-M estimator is
S(L>Lk) = Πi=1..k (ni-ci)/ni
where ci are the number of objects that drop out between Li-1 and Li.
Note that the K-M estimator is a maximum likelihood estimator of the cumulative probability (actually one minus the cumulative probability as it is usually understood), and uncertainties on it must be estimated via Monte Carlo or bootstrap techniques [or not.. see below].
]]>I rather skip all technical details on ultracool dwarfs and binary stars, reviews on star formation studies, like initial mass function (IMF), astronomical survey studies, which Allen gave a fair explanation in arxiv/astro-ph:0707.2064v1 but want to emphasize that based on simple Bayes’ rule and careful set-ups for likelihoods and priors according to data (ultracool dwarfs), quite informative conclusions were drawn:
Before, asking for observational efforts for improvements, it is commented 75% as the the upper limit of the ultracool binary population.
]]>This paper comments on classical confidence intervals and upper limits, as the so-called a flip-flopping problem, both of which are related asymptotically (when n is large enough) by the definition but cannot be converted from one to the another by preserving the same coverage due to the poisson nature of the data.
I’ve heard a few discussions about classical confidence intervals and upper limits from particle physicists and theoretical statisticians. Nonetheless, not being in the business from the beginning (1. the time point of particle physicists aware of statistics to obtain coverages and upper limits or 2. Neyman’s publication (1937) Phil. Trans. Royal Soc. London A, 236, p.333) makes it hard to grasp the essence of this flip-flopping problem. On the other hand, I could sense that lots of statistical challenges (for both classical and bayesian statisticians) residing in this flip-flopping problem and wish for some tutorials or chronological reviews on the subject.
]]>