Q: Lowess error bars?

vlk — Tue, 03 Jun 2008 06:53:14 +0000

It is somewhat surprising that astronomers haven’t cottoned on to Lowess curves yet. That’s probably a good thing because I think people already indulge in smoothing far too much for their own good, and Lowess makes for a very powerful hammer. But the fact that it is semi-parametric and is based on polynomial least-squares fitting does make it rather attractive.

And, of course, sometimes it is unavoidable, or so I told Brad W. When one has too many points for a regular polynomial fit, and they are too scattered for a spline, and too few to try a wavelet “denoising”, and no real theoretical expectation of any particular model function, and all one wants is “a smooth curve, damnit”, then Lowess is just the ticket.

Well, almost.

There is one major problem — how does one figure what the error bounds are on the “best-fit” Lowess curve? Clearly, each fit at each point can produce an estimate of the error, but simply collecting the separate errors is not the right thing to do because they would all be correlated. I know how to propagate Gaussian errors in boxcar smoothing a histogram, but this is a whole new level of complexity. Does anyone know if there is software that can calculate reliable error bands on the smooth curve? We will take any kind of error model — Gaussian, Poisson, even the (local) variances in the data themselves.

Dance of the Errors

vlk — Mon, 21 Jan 2008 19:33:26 +0000

One of the big problems that has come up in recent years is in how to represent the uncertainty in certain estimates. Astronomers usually present errors as +-stddev on the quantities of interest, but that presupposes that the errors are uncorrelated. But suppose you are estimating a multi-dimensional set of parameters that may have large correlations amongst themselves? One such case is that of Differential Emission Measures (DEM), where the “quantity of emission” from a plasma (loosely, how much stuff there is available to emit — it is the product of the volume and the densities of electrons and H) is estimated for different temperatures. See the plots at the PoA DEM tutorial for examples of how we are currently trying to visualize the error bars. Another example is the correlated systematic uncertainties in effective areas (Drake et al., 2005, Chandra Cal Workshop). This is not dissimilar to the problem of determining the significance of a “feature” in an image (Connors, A. & van Dyk, D.A., 2007, SCMA IV).

Here is a specific example that came up due to a comment by a referee on a paper with David G.-A. We had said that the O abundance is dominated by uncertainties in the DEM at low temperatures because that is where most of the emission from O is formed. The referee disputed this, saying yeah, but O is also present at higher temperatures, and since the DEM is much higher there, that should be the predominant contribution to the estimate. In effect, the referee said, “show me!” The problem is, how? The measured fluxes are:

fO7_obs = 2 +- 0.75

fO8_obs = 4 +- 0.88

The predicted fluxes are:

fO7_pred = 1.8 +- 0.72

fO8_pred = 3.6 +- 0.96

where the error bars here come from the stddev of the fluxes predicted by each DEM realization that comes out of the MCMC analysis. On the face of it, it looks like a pretty good match to the observations, though a slightly different picture emerges if one were to look at the distribution of the predicted fluxes:

mode(fO7_pred)=0.76 (95% HPD interval = 0.025:2.44)

mode(fO8_pred)=2.15 (95% HPD interval = 0.95:4.59)

What if one computed the flux at each temperature and did the same calculation separately? That is shown in the following plot, where the product of the DEM and the line emissivity computed at each temperature bin is shown for both O VII (red) and O VIII (blue). The histograms are for the best-fit DEM solution, and the vertical bars are stddevs on the product, which differs from the flux only by a constant. The dashed lines show the 95% highest posterior density intervals.

Figure 1: Fluxes from O VII and O VIII computed at each temperature from DEM solution of RST 137B. The solid histograms are the fluxes for the best-fit DEM, and the vertical bars are the stddev for each temperature bin. The dotted lines denote the 95% highest-posterior density intervals for each temperature.

But even this tells an incomplete tale. The full measure of the uncertainty goes unseen until all the individual curves are seen, as in the animated gif below which shows the flux calculated for each MCMC draw of the DEM:

Figure 2: Predicted flux in O VII and O VIII lines as a product of line emissivity and MCMC samples of the DEM for various temperatures. The dashed histogram is from the best-fit DEM, the solid histograms are for the various samples (the running number at top right indicates the sample sequence; only the last 100 of the 2000 MCMC draws are shown).

So this brings me to my question. How does one represent this kind of uncertainty in a static plot? We know what the uncertainty is, we just don’t know how to publish them.

The AstroStat Slog » error bands

Q: Lowess error bars?

Dance of the Errors