The AstroStat Slog » MCMC

[ArXiv] 4th week, Jan. 2008

hlee — Fri, 25 Jan 2008 16:37:12 +0000

Only three papers this week. There were a few more with chi-square fitting and its error bars but excluded.

[astro-ph:0801.3346] Hipparcos distances of Ophiuchus and Lupus cloud complexes M. Lombardi, C. Lada, & J. Alves (likelihoods and MCMC were used)
[astro-ph:0801.3543] Results of the ROTOR-program. II. The long-term photometric variability of weak-line T Tauri stars K.N. Grankin et. al. (discusses periodogram)
[astro-ph:0801.3822] Estimating the Redshift Distribution of Faint Galaxy Samples M. Lima et.al.

]]>

Dance of the Errors

vlk — Mon, 21 Jan 2008 19:33:26 +0000

One of the big problems that has come up in recent years is in how to represent the uncertainty in certain estimates. Astronomers usually present errors as +-stddev on the quantities of interest, but that presupposes that the errors are uncorrelated. But suppose you are estimating a multi-dimensional set of parameters that may have large correlations amongst themselves? One such case is that of Differential Emission Measures (DEM), where the “quantity of emission” from a plasma (loosely, how much stuff there is available to emit — it is the product of the volume and the densities of electrons and H) is estimated for different temperatures. See the plots at the PoA DEM tutorial for examples of how we are currently trying to visualize the error bars. Another example is the correlated systematic uncertainties in effective areas (Drake et al., 2005, Chandra Cal Workshop). This is not dissimilar to the problem of determining the significance of a “feature” in an image (Connors, A. & van Dyk, D.A., 2007, SCMA IV).

Here is a specific example that came up due to a comment by a referee on a paper with David G.-A. We had said that the O abundance is dominated by uncertainties in the DEM at low temperatures because that is where most of the emission from O is formed. The referee disputed this, saying yeah, but O is also present at higher temperatures, and since the DEM is much higher there, that should be the predominant contribution to the estimate. In effect, the referee said, “show me!” The problem is, how? The measured fluxes are:

fO7_obs = 2 +- 0.75

fO8_obs = 4 +- 0.88

The predicted fluxes are:

fO7_pred = 1.8 +- 0.72

fO8_pred = 3.6 +- 0.96

where the error bars here come from the stddev of the fluxes predicted by each DEM realization that comes out of the MCMC analysis. On the face of it, it looks like a pretty good match to the observations, though a slightly different picture emerges if one were to look at the distribution of the predicted fluxes:

mode(fO7_pred)=0.76 (95% HPD interval = 0.025:2.44)

mode(fO8_pred)=2.15 (95% HPD interval = 0.95:4.59)

What if one computed the flux at each temperature and did the same calculation separately? That is shown in the following plot, where the product of the DEM and the line emissivity computed at each temperature bin is shown for both O VII (red) and O VIII (blue). The histograms are for the best-fit DEM solution, and the vertical bars are stddevs on the product, which differs from the flux only by a constant. The dashed lines show the 95% highest posterior density intervals.

Figure 1: Fluxes from O VII and O VIII computed at each temperature from DEM solution of RST 137B. The solid histograms are the fluxes for the best-fit DEM, and the vertical bars are the stddev for each temperature bin. The dotted lines denote the 95% highest-posterior density intervals for each temperature.

But even this tells an incomplete tale. The full measure of the uncertainty goes unseen until all the individual curves are seen, as in the animated gif below which shows the flux calculated for each MCMC draw of the DEM:

Figure 2: Predicted flux in O VII and O VIII lines as a product of line emissivity and MCMC samples of the DEM for various temperatures. The dashed histogram is from the best-fit DEM, the solid histograms are for the various samples (the running number at top right indicates the sample sequence; only the last 100 of the 2000 MCMC draws are shown).

So this brings me to my question. How does one represent this kind of uncertainty in a static plot? We know what the uncertainty is, we just don’t know how to publish them.

]]>

[Quote] Bootstrap and MCMC

hlee — Tue, 01 Jan 2008 00:48:59 +0000

The Bootstrap and Modern Statistics Brad Efron (2000), JASA Vol. 95 (452), p. 1293-1296.

If the bootstrap is an automatic processor for frequentist inference, then MCMC is its Bayesian counterpart.

Sometime in my second year of studying statistics, I said that bootstrap and MCMC are equivalent but reflect different streams in statistics. The response to this comment was ‘that’s nonsense.’ Although I forgot details of the circumstance, I was hurt and didn’t try to prove myself. After years, the occasion immediately floats on the surface upon seeing this sentence.

]]>

[ArXiv] 3rd week, Oct. 2007

hlee — Fri, 19 Oct 2007 16:51:25 +0000

Quite interesting papers were up at arXiv, including a theoretical statistics paper that no astronomer manages to miss. To find the paper and others, please click

[astro-ph:0710.2371]
Probing Non-Gaussianity In The Cosmic Microwave Background Anisotropies: One Point Distribution Function by E. Jeong, and G. F. Smoot
[astro-ph:0710.2397]
Initial-Final Mass Relationship for Stars of Different Metallicities by X. Meng and Z. Chen
[astro-ph:0710.2436]
Optical afterglows of gamma-ray bursts: a bimodal distribution?” N. Marco, G. Gabriele, and G. Giancarlo.
[astro-ph:0710.2455]
Superfluid turbulence and pulsar glitch statistics by A. Melatos and C. Peralta
[astro-ph:0710.2664]
Reconstructing Images from Projections Using the Maximum-Entropy Method. Numerical Simulations of Low-Aspect Astrotomography by A. T. Bajkova
[astro-ph:0710.2783]
Void Statistics and Void Galaxies in the 2dFGRS by A. M. von Benda-Beckmann and V. Mueller
[astro-ph:0710.2872] summer school material
Geometrical constraints on dark energy models by R. Lazkoz
[astro-ph:0710.2882]
The IMF of the massive star-forming region NGC 3603 from NIR AO observations by Y. Harayama, F. Eisenhauer, and F. Martins
[hep-ph:0710.1952]
Global neutrino parameter estimation using Markov Chain Monte Carlo by S. Hannestad
[astro-ph:0710.3099]*
Short Gamma Ray Bursts: a bimodal origin? by R. Salvaterra, A. Cerutti, G. Chincarini, M. Colpi, C. Guidorzi, P.Romano
[astro-ph:0710.2397]
Clues to Globular Cluster Evolution from Multiwavelength Observations of Extragalactic Systems by A. Kundu, T. J. Maccarone, and S. E. Zepf
(I’ve been keen on globular clusters, best archeological objects in the universe to understand galaxy formation and the history of the universe. They are like genomes of a galaxy.)
[stat.TH:0710.3478]
Nonparametric estimation of a point-spread function in multivariate problems by P. Hall and P. Qiu

[1]: Personally, point spread function (PSF) plays a major role in extracting astrophysical information from raw observations.

[2]: In addition to this week’s GRB paper, [astro-ph:0710.3099]*, there were some statistical studies about multi-modality in GRBs [see Three classes of GRBs].

I think that (astro)physics provides templates such as likelihoods for a statistical analysis. The statistical disputes between the bimodality and the trimodality of GRB distribution can be settled from 1. more studies like [astro-ph:0710.3099]* or 2. better statistical inference tools or model selections. Unfortunately, 2 requires templates, which are acquired from in-depth cross matching survey studies.

]]>

An alternative to MCMC?

vlk — Sun, 19 Aug 2007 04:31:09 +0000

I think of Markov-Chain Monte Carlo (MCMC) as a kind of directed staggering about, a random walk with a goal. (Sort of like driving in Boston.) It is conceptually simple to grasp as a way to explore the posterior probability distribution of the parameters of interest by sampling only where it is worth sampling from. Thus, a major savings from brute force Monte Carlo, and far more robust than downhill fitting programs. It also gives you the error bar on the parameter for free. What could be better?

Feroz & Hobson (2007, arXiv:0704.3704) describe a technique called Nested Sampling (Skilling 2004), one that could give MCMC a run for its money. It takes the one inefficient part of MCMC — the burn-in phase — and turns that into a virtue. The way it seems to work is to keep track of how the parameter space is traversed as the model parameters {theta} reach the mode of the posterior, and to take the sequence of likelihoods thus obtained L(theta), and turn it around to get theta(L). Neat.

Two big (computational) problems that I see are (1) the calculation of theta(L), and (2) the sampling to discard the tail of L(theta). The former, it seems to me, becomes intractable exactly when the likelihood surface gets complicated. The latter, again, it seems you have to run through just as many iterations as in MCMC to get a decent sample size. Of course, if you have a good theta(L), it does seem to be an improvement over MCMC in that you won’t need to run the chains multiple times to make sure you catch all the modes.

I think the main advantage of MCMC is that it produces and keeps track of marginalized posteriors for each parameter, whereas in this case, you have to essentially keep a full list of samples from the joint posterior and then marginalize over it yourself. The larger the sample size, the harder this gets, and in fact it is a bit difficult to tell whether the nested sampling method is competing with MCMC or Monte Carlo integration.

Is there any reason why this should not be combined with MCMC? i.e., can we use nested sampling from the burn-in phase to figure out the proposal distributions for Metropolis or Metropolis-Hastings in situ, and get the best of both worlds?

]]>

On the unreliability of fitting

vlk — Fri, 25 May 2007 20:30:34 +0000

Despite some recent significant advances in Statistics and its applications to Astronomy (Cash 1976, Cash 1979, Gehrels 1984, Schmitt 1985, Isobe et al. 1986, van Dyk et al. 2001, Protassov et al. 2002, etc.), there still exist numerous problems and limitations in the standard statistical methodologies that are routinely applied to astrophysical data. For instance, the basic algorithms used in non-linear curve-fitting in spectra and images have remained unchanged since the 1960′s: the downhill simplex method of Nelder & Mead (1965) modified by Powell, and methods of steepest descent exemplified by Levenberg-Marquardt (Marquardt 1963). All non-linear curve-fitting programs currently in general use (Sherpa, XSPEC, MPFIT, PINTofALE, etc.) with the exception of Monte Carlo and MCMC methods are implementations based on these algorithms and thus share their limitations.

It has long been known that non-linear curve-fitting, for all its apparent objectivity and the statistical theoretical foundation that enables us to evaluate the goodness of fits, is still something of an art. Leaving aside issues such as a correct specification of the asymmetrical errors that are usually prevalent in astrophysical situations (mostly due to the underlying Poisson distribution), fitting a model curve to the data is never a simple and straightforward process. More often than not, the parameters returned by the programs as those that “best fit” the data are not those that truly maximize the statistical likelihood, and are caused by the programs being trapped in local minima of the chi^2 surface. Furthermore, numerical errors also creep in to the outputs based on how stopping rules are implemented: cases abound where repeated applications of the fitting program simply to move the “best-fit” parameters closer to the true best-fit values but do not ever reach it. These problems are compounded when the number of parameters in the model are large (>10), and there are significant correlations between the parameters.

A typical strategy adopted to ensure that the fitting process does result in the best possible fit, is to ensure that the fitting process begins with parameters that are very close to the true best-fit values. This is done by first carrying out the fitting in multiple stages, with only a few parameters fit at a time, holding the others fixed, until the parameter values returned by the program show that the solutions have converged. Only then is a fit to all the parameters attempted. Then, the errors on each parameter are computed by taking projections cutting through the chi^2 surface along each parameter axis, and if the best-fit values do not change during this process, then one can be confident that the best possible fit has been obtained.

What this means in practice is that studies that carry out automated fits to a large number of data sets where there is no possibility of human intervention, and thus no means to verify that the best fit parameters have been found, are unreliable. As an example, we show in the Figure above the case where a Gaussian model was fit 100 times to profiles generated by a Poisson with a mean mu=1000 (the Poisson profile is indistinguishable from a Gaussian for mu>~100, cf. Park et al. 2006) and with a total normalization of 1000. In all cases, the fits were started with initial values at the nominal best-fit values. The Figure shows a scatter plot of the initial value of the chi^2, obtained at the end of the first fit attempt, versus the final chi^2, obtained after going through the process described above. Changes as large as delta chi^2 approx 25 are seen, which changes the character of the fit significantly.

]]>