Lost in Translation: Measurement Error

You would think that something like “measurement error” is a well-defined concept, and everyone knows what it means. Not so. I have so far counted at least 3 different interpretations of what it means.

Suppose you have measurements X={Xi, i=1..N} of a quantity whose true value is, say, X0. One can then compute the mean and standard deviation of the measurements, E(X) and σX. One can also infer the value of a parameter θ(X), derive the posterior probability density p(θ|X), and obtain confidence intervals on it.

So here are the different interpretations:

  1. Measurement error is σX, or the spread in the measurements. Astronomers tend to use the term in this manner.
  2. Measurement error is X0-E(X), or the “error made when you make the measurement”, essentially what is left over beyond mere statistical variations. This is how statisticians seem to use it, essentially the bias term. To quote David van Dyk

    For us it is just English. If your measurement is different from the real value. So this is not the Poisson variability of the source for effects or ARF, RMF, etc. It would disappear if you had a perfect measuring device (e.g., telescope).

  3. Measurement error is the width of p(θ|X), i.e., the measurement error of the first type propagated through the analysis. Astronomers use this too to refer to measurement error.

Who am I to say which is right? But be aware of who you may be speaking with and be sure to clarify what you mean when you use the term!

8 Comments
  1. hlee:

    There’s no right and wrong in measurement error modeling. Although astronomer’s measurement errors are not treated or haven’t been acknowledged in statistics literature extensively, astronomers might like to check publications and books by Raymond Carroll, the leading expert in measurement errors as a starter. One can build up a longer list than what’s given above depending on how one understands measurement errors and their sources.

    01-03-2009, 3:41 am
  2. brianISU:

    I have a question about part 3, is this more of a Bayesian viewpoint? I am interpreting this statement as the width of the credible interval for theta, or, how much uncertainty in our knowledge we have about theta (I am assuming that p(theta|x) is the posterior). Is this a correct interpretation?

    01-12-2009, 1:52 pm
  3. vlk:

    Not really, that was just my way of making sense of the quantity. The thing is, most astronomical data analysis doesn’t really go into true first principles. “Counts” are actually fluctuations in amplifier voltage, for instance, but nobody deals with data at that level, and the errors are in a sense propagated to the inferred quantity, provided that that quantity is close in some sense to the original data. For some astronomers, magnitudes are close enough :)

    01-12-2009, 11:05 pm
  4. Dieda:

    Can we try an example ?
    Let’s say that I’m observing a source to obtain its flux and I look at it with 10 scans or shots. The user interface at my telescope already converts the counts into Kelvin once it knows the calibration noise diode value (typically assumed constant…whreas it’s not…but ok) and gives me out a Kelvin value for each scan/shot, associated with its error.
    Now I apply some additional corrections (pointing, gain, etc…) and I propagate the error according to the error propagation theory/formulae, finally I apply the K/Jy conversion using my calibrators as reference and I have a single Jy value with its error, for each scan/shot.
    Now…I have more or less a measurement of 1.00+-0.03 Jy for each scan/shot and I’d like to average the values in order to give out a single value for my source. The standard average and the weighted average are similar, but regarding the error some troubles arise :
    - I have a standard deviation of 0.02 Jy
    - I have a weighted error of approx. 0.03/sqrt(10) =0.009 Jy
    What would be an “honest” output to give out ?
    - 0.02 Jy would tell how much the measurements have varied during the observation but tells nothing about the instrumental accuracy.
    - 0.009 Jy would tell how accurate my telescope is BUT if one repeat the measurement with just one shot and obtain 0.03 Jy the first reaction could be “Ehi, why is my result so bad with respect to what published by others ?!?”.

    Any suggestions about an honest :-) way to mix these information (SD 0.02 Jy, Single error 0.03 Jy, WE 0.009 Jy) ?

    Thanks

    02-02-2009, 6:21 am
  5. vlk:

    Good example!

    So at the end of the run, you have measurements {fi~ N(1,0.03), i=1..10}, where 0.03 is due to photon noise (i.e., just the statistical error, which I would call the measurement error). The SD of 0.02 that you calculate from the 10 measurements should theoretically match the statistical error, but in practice may not due to variations in source, calibration, etc.

    From this you can compute a mean fav=1.0 and an error on the mean, sigmaf=0.009, which is your estimate of the source intensity based on multiple measurements. In principle, it would be nice to fold in the non-statistical variations into the sigmaf, but that is a hard calculation, and requires much more modeling.

    I would report the fav+-sigmaf because it includes the information from all the observations. Of course, the next measurement (at the same exposure time and same observing conditions) will again be drawn from N(1,0.03), say f11 = 1.04, with a measurement error of 0.03. You can then compare this with the previous estimate of the intensity, fav. You can ask whether fav-f11 differs from zero by comparing it to the propagated error, sqrt(0.009^2+0.03^2) ~ 0.031, which results in a S/N of -1.3, much less than the usual criterion for a significant difference, so no need to panic there.

    02-02-2009, 1:15 pm
  6. Dieda:

    Hi,

    Thanks for your reply! I have an additional question and a comment/question :

    - What is the usual criterion for a significant difference that you are referring to ?

    - Let’s say that now I am the astronomer that, due to the lack of time, observes the source just twice. Now my f11=0.03~sigmaf and let’s assume that my SD is always 0.02. By applying the same criterion that you explained I should report a value of fav+-sigmaf = 1.04+-0.03, simply living with the fact that my single observation is as accurate as the one made before, but my final value is intrinsically less accurate as I had not the time of doing many observations…correct ?

    02-04-2009, 3:59 am
  7. vlk:

    The usual criterion for a significant difference is the infamous 3-sigma. If the difference exceeds 3*propagated error, that’s taken as an indication (with a probability of 0.997) that it is unlikely to be a random statistical fluctuation.

    I don’t understand the second question. If you only had two observations in the original sample, then the error on the mean would be greater than 0.009. You can always combine f11 with the other samples to obtain a better estimate of the mean. And certainly, with 3 observations the uncertainty on how well the mean can be known is much larger than if you had 11 observations.

    02-04-2009, 5:17 pm
  8. Dieda:

    For the 3-sigma : ok, I thought that maybe you were referring to some alternative :-D

    Regarding the 2nd point sorry if I was unclear, it was more a statement than a real question. The point is that sometime we tend to identify the error of the measurement with the accuracy of the telescope. Hence if I say that I have measured a flux of <i>=1+-0.009 Jy I tend to state “My telescope is accurate at the level of 9 mJy”, whereas in this case the accuracy of the telescope (that in the end is given by a single measurement, expecially if I see that the error on a single meas. tend to be pretty stable) is mixed with the accuracy of my observation strategy (so to say).

    Now things are a bit clearer :-)

    Cheers

    02-05-2009, 4:54 am
Leave a comment