btw, during one of our CHASC meetings recently, Xiao-li described a fairly general method to get comparative goodness-of-fits out of MCMC. So even though MCMC doesn’t give you a normalization constant, it is possible to work around that to compare nested models. More on that later.
]]>Second, I’m not sure you have correctly characterized MCMC or nested sampling. MCMC does not
hunt for the mode; in fact, the MCMC sample with the highest prior*likelihood may not be
a very good estimator for the mode, especially as dimension grows (but it may be a good
starting point for input into an optimizer). MCMC doesn’t give you error bars “for free”—that’s
what it’s aiming for in the first place! MCMC correctly wanders around the posterior
via an algorithm that does not require one to normalize prior*likelihood; an
unfortunate consequence is that it can’t tell you what the normalization constant
is (thought there are tricky ways to fudge around this, with various degrees of
complexity and success depending on the problem). Nested sampling targets that
normalization constant; it gives you the posterior samples (the “error bars”) “for free.”
A good way to understand nested sampling is to think of it as building an approximation
to the *Lesbegue* integral of prior*likelihood, as opposed to its Riemann integral. For
the familiar Riemann integral of f(x), you divide the x axis into “bins” and add up the
areas of the resulting narrow vertical “bars” of height f(x). For the Lesbegue version,
you divide the *ordinate* into bins, and for each bin, find the size (“measure”) of the x
region corresponding to the intersection of the f-axis bins with f(x). This builds
the integral out of a bunch of wide but vertically short *horizontal* bars. Nested
sampling does this by using an algorithm for finding vertical bins (defined by likelihood
values) that has a cute statistical property: the associate x measures are, statistically,
distributed as order statistics. So even though you can’t expect to find those measures
exactly (they would be the areas/volumes between neighboring likelihood contours, over the
*whole* space), by construction, the algorithm lets you “guess” those measures reasonably
accurately. It’s truly ingenious.
There is no theta(L) step involved; indeed, there can’t be—there is no unique solution
(it’s a contour or surface or hypersurface). The algorithm works in theta space directly,
and you can use the resulting thetas as posterior samples.
But you are right about this hard part: “sampling to discard the tail of L(theta).” There
is no general way to do it (well, maybe the new paper has one!). The few problems that
have had good success with NS have had a special structure that allowed this sampling to
be done cleverly and accurately. The one previous cosmological example used an approximate
method, and to my mind (I only quickly read it) did not make a compelling case that
the results were accurate; and to the extent they were, it’s because the problem was
not that challenging. We (my collaboration with some statisticians at Duke on
exoplanet problems) had a grad student spend a semester trying to get NS to work on a
simple exoplanet problem (with a complex, multimodal posterior), and it’s the sampling
“above the tail” where we got hung up. He (Floyd Bullard) could
get it to work, but it was not as efficient as more common, less clever methods
(e.g., importance sampling).
Maybe the new paper will force us to re-examine NS.
-Tom
]]>]]>[Response: Yeah, that is what I had understood. But nested sampling seems to get around that by somehow weighting the likelihoods. Not quite sure how it works or how well it works in real situations.
-vinay]