[ArXiv] 4th week, Oct. 2007

hlee — Fri, 26 Oct 2007 22:52:02 +0000

I hope there are a paper or two drags your attentions and stimulates your thoughts in astrostatistics from arXiv.

[stat.ML:0710.3742]
Bayesian Online Change Point Detection by R. Adams and D. MacKay
[astro-ph:0710.3600]
Statistical Methods for Investigating the Cosmic Ray Energy Spectrum by J. Hague, B. Becker, M. Gold, J.Matthews, and J. Urb\’a\v{r}
[astro-ph:0710.3618]
Fast algorithms for matching CCD images to a stellar catalogue by V. Tabur
[astro-ph:0710.4019]
A principal component analysis approach to the morphology of Plaetary Nebulae by S. Akras and P. Boumis
[astro-ph:0710.4020]
Dice and Pulsars by V. M. Kontorovich
[astro-ph:0710.4075]
Getting More From Your Multicore: Exploiting OpenMP for Astronomy by M. S. Noble
[astro-ph:0710.4143]
Lensing and Supernovae: Quantifying The Bias on the Dark Energy Equation of State by D. Sarkar and A. Amblard
[astro-ph:0710.4158]
A Cross-Match of 2MASS and SDSS: Newly-Found L and T Dwarfs and an Estimate of the Space Density of T Dwarfs by S. Metchev, et. al.
[astro-ph:0710.4262]
Crowded-Field Astrometry with the Space Interferometry Mission – I. Estimating the Single-Measurement Astrometric Bias Arising from Confusion by R. Sridharan and R. Allen
[astro-ph:0710.4556]
X-Ray Binaries and the Current Dynamical States of Galactic Globular Clusters by J. M. Fregeau
[stat.ME]
The Use of Unlabeled Data in Predictive Modeling by F. Liang, S. Mukherjee, and M. West

[ArXiv] Matching Sources, July 11, 2007

hlee — Fri, 13 Jul 2007 23:24:23 +0000

From arxiv/astro-ph: 0707.1611 Probabilistic Cross-Identification of Astronomical Sources by Budavari and Szalay

As multi-wave length studies become more popular, various source matching methodologies have been discussed. One of such methods particularly focused on Bayesian idea was introduced by Budavari and Szalay with a demand for symmetric algorithms in a unified framework.

First, astrometric precision, varying due to instrumental effects and the nature of objects in multi-bands, is explained as well as Bayes factor for testing a hypothesis that multiple observations from various catalogs are from the same source. Then, the formula of calculating the Bayes factor, acquired from the spherical normal distribution is presented. However, the matching process is not a straightforward process of calculating the Bayes factor analytically derived from the spherical normal distribution particularly when the goal is finding a new objects with unknown spectral energy distributions, where physics squeezes in. Before summarizing, practical issues such as fast and efficient computation on multiple catalogs (sequentially adding a catalog to current n catalogs) and recursion formulas for evaluating the weight of evidence are given.

I want to quote a few sentences of my liking from the paper:

Often Bayesian analysis is referred to as the calculus of belief; however, it should rather be thought of as the calculus of observational evidence.

The Bayesian analysis is inherently recursive. As soon as we obtain new measurements, and compute the posterior probability, that becomes the prior for subsequent studies.

Also one sentence that I have some questions:

… it penalizes complicated hypotheses (with smaller prior probability [hlee: is this a generally true statement? Complicated model occupies larger parameter space by the dimensions although they mentioned that the hypothesis of separate sources occupies a more restricted parameter space. Why complicated model occupies this restricted parameter space?]) over simpler ones.

In model selection, BIC penalizes complicated model more, proportional to the number of parameters (also proportional to log sample size) but this comes from Laplace approximation, not the idea that a complicated model has small prior probability.

In addition, I want to raise a question that under the spherical normal distribution, what is the difference between using Bayes factor and classical hypothesis testing to test multiple observations from different wavelengths are from the same source? To my knowledge, numerous studies on multiple hypothesis testing for biological data (our counterpart could be testing on millions of sources across catalogs) have been available recent years and such frequentist approaches seem to be eligible for cross-matching problem applications.

Finally, I wish to add some references on Cross-Matching/Coincidence Assessment with the VO given by Tom Loredo from one of the SAMSI Surveys and Population Studies working group meetings (Please, note that his talk on the subject matter and other journal papers linked at the SAMSI AstroStat websites are password protected. SAMSI is an acronym of
Statistical and Applied Mathematical Sciences Institute).

The AstroStat Slog » Matching

[ArXiv] 4th week, Oct. 2007

[ArXiv] Matching Sources, July 11, 2007