The AstroStat Slog » EotW

Redistribution

vlk — Sat, 01 Nov 2008 16:41:48 +0000

RMF. It is a wørd to strike terror even into the hearts of the intrepid. It refers to the spread in the measured energy of an incoming photon, and even astronomers often stumble over what it is and what it contains. It essentially sets down the measurement error for registering the energy of a photon in the given instrument.

Thankfully, its usage is robustly built into analysis software such as Sherpa or XSPEC and most people don’t have to deal with the nitty gritty on a daily basis. But given the profusion of statistical software being written for astronomers, it is perhaps useful to go over what it means.

The Redistribution Matrix File (RMF) is, at its most basic, a description of how the detector responds to incoming photons. It describes the transformation from the photons that are impinging on the detector to the counts that are recorded by the instrument electronics. Ideally, one would want there to be a one-to-one mapping between the photon’s incoming energy and the recorded energy, but in the real world detectors are not ideal. The process of measuring the energy introduces a measurement error, which is encoded as the probability that incoming photons at energy E are read out in detector channels i. Thus, for each energy E, there results an array of probabilities p(i|E) such that the observed counts in channel i,
$$c_i|d_E \sim {\rm Poisson}(p(i|E) \cdot d_E) \,,$$
where d_E is the expected counts at energy E, and is the product of the source flux at the telescope and the effective area of the telescope+detector combination. Equivalently, the expected counts in channel i,
$${\rm E}(c_i|d_E) = p(i|E) \cdot d_E \,.$$

The full format of how the arrays p(i|E) are stored in files is described in a HEASARC memo, CAL/GEN/92-002a. Briefly, it is a FITS file with two tables, of which only the first one really matters. This first table (“SPECRESP MATRIX”) contains the energy grid boundaries {E_j; j=1..N_E} where each entry j corresponds to one set of p(i|E_j). The arrays themselves are stored in compressed form, as the smallest possible array that excludes all the zeros. An ideal detector, where $$p(i|E_j) \equiv \delta_{ij}$$ would be compressed to a matrix of size N_E × 1. The FITS extension also contains additional arrays to help uncompress the matrix, such as the index of the first non-zero element and the number of non-zero elements for each p(i|E_j).

The second extension (“EBOUNDS”) contains an energy grid {e_i; i=1..N_channels} that maps to the channels i. This grid is fake! Do not use it for anything except display purposes or for convenient shorthand! What it is is a mapping of the average detector gain to the true energy, such that it lists the most likely energy of the photons registered in that bin. This grid allows astronomers to specify filters to the spectrum in convenient units that are semi-invariant across instruments (such as [Å] or [keV]) rather than detector channel numbers, which are unique to each instrument. But keep in mind, this is a convenient fiction, and should never be taken seriously. It is useful when the width of p(i|E) spans only a few channels, and completely useless for lower-resolution detectors.

]]>

Blackbody Radiation [Eqn]

vlk — Wed, 27 Aug 2008 17:00:22 +0000

Like spherical cows, true blackbodies do not exist. Not because “black objects are dark, duh”, as I’ve heard many people mistakenly say — black here simply refers to the property of the object where no wavelength is preferentially absorbed or emitted, and all the energy input to it is converted into radiation. There are many famous astrophysical cases which are very good approximations to perfect blackbodies — the 2.73K microwave background radiation left over from the early Universe, for instance. Even the Sun is a good example. So it is often used to model the emission from various objects.

The blackbody spectrum is
$$B_{\nu}(T) = \frac{2 h \nu^3}{c^2} \frac{1}{e^{h \nu / k_B T} – 1} ~~ {\rm [erg~s^{-1}~cm^{-2}~Hz^{-1}~sr^{-1}]} \,,$$
where ν is the frequency in [Hz], h is Planck’s constant, c is the speed of light in vacuum, and k_B is Boltzmann’s constant. The spectrum is interesting in many ways. Its shape is characterized by only one parameter, the radiation temperature T. A spectrum with a higher T is greater in intensity at all frequencies compared to one with a lower T, and the integral over all frequencies is σ T⁴, where $$\sigma \equiv \frac{2\pi^5k_B^4}{15 c^2 h^3}$$ is the Stefan-Boltzmann constant. Other than that, the normalization is detached, so to speak, from T, and differences in source luminosities are entirely attributable to differences in emission surface area.

The general shape of a blackbody spectrum is like a rising parabola at low ν (which led to much hand-wringing in the late 19th century about the Ultraviolet Catastrophe) and an exponential drop at high ν, with a well-defined peak in between. The frequency at which the spectrum peaks is dependent on the temperature, with
$$\nu_{\rm max} = 2.82 \frac{k_B T}{h}$$,
or equivalently,
$$ \lambda_{\rm max} = \frac{2.9\cdot10^4}{T} ~~ {\rm[\AA]} \,,$$
where T is in [degK].

]]>

Magnitude [Eqn]

vlk — Wed, 20 Aug 2008 17:00:34 +0000

I still remember my first class as a new grad student. As a cocky Physics graduate, I was quite sure I knew plenty of astronomy. Astro 301, class 1, and it took all of 20 minutes of talk about stellar magnitudes to put that notion to permanent rest. So, for the sake of our stats colleagues, here’s a brief primer on one of the basic building blocks of astronomy.

For historical reasons, astronomers measure the brightness of celestial objects in rank order. The smaller the rank number, aka magnitude, the brighter the object. Thus, a star of the first magnitude is much brighter than a star of the sixth magnitude, and it would take exceptionally good eyes and a very dark sky to see a star of the seventh magnitude. Now, it turns out that the human eye perceives brightness on a log scale, so magnitudes are numerically similar to log(brightness). And because they are a ranking list, it is always with reference to a standard. After some rough calibration to match human perception to true brightness of stars in the night sky, we have a formal definition for magnitude,
$$m = – \frac{5}{2}\log_{10}\left(\frac{f_{object}}{f_{standard}}\right) \,,$$
where f_object is the flux from the object and f_standard is the flux from a fiducial standard. In the optical bands, the bright star Vega (α Lyrae) has been adopted as the standard, and has magnitudes of 0 in all optical filters. (Well, not exactly because Vega is not constant enough, and as a practical matter there is nowadays a hierarchy of photometric standard stars that are accessible at different parts of the sky.) Note that we can also write this in terms of the intrinsic luminosity L_object of the object and the distance d to it,
$$m = – \frac{5}{2}\log_{10}\left(\frac{L_{object}}{4 \pi d^2}\frac{1}{f_{standard}}\right) \,.$$

Because astronomical objects are located at a vast variety of distances, it is useful to define an intrinsic magnitude of the object, independent of the distance. Thus, in contrast to the apparent magnitude m, which is the brightness at Earth, an absolute magnitude is defined as the brightness that would be perceived if the object were 10 parsecs away,
$$M \equiv m|_{d={\rm 10~pc}} = m – \frac{5}{2}\log_{10}\left(\frac{d^2}{{\rm (10~pc)}^2}\right) \equiv m – 5\log_{10}d + 5$$
where d is the distance to the object in [parsec], and the squared term is of course because of the inverse square law.

There are other issues such as interstellar absorption, cosmological corrections, extent of the source, etc., but let’s not complicate it too much right away.

Colors are differences in the magnitudes in different passbands. For instance, if the apparent magnitude in the blue filter is m_B and in the green filter is m_V (V for “visual”), the color is m_B-m_V and is usually referred to as “B-V” color. It is the difference in magnitudes, and is related to the log ratio of the intensities.

For an excellent description of what is involved in the measurement of magnitudes and colors, see this article on analyzing photometric data by Star Stryder.

]]>

Differential Emission Measure [Eqn]

vlk — Wed, 13 Aug 2008 17:00:13 +0000

Differential Emission Measures (DEMs) are a summary of the temperature structure of the outer atmospheres (aka coronae) of stars, and are usually derived from a select subset of line fluxes. They are notoriously difficult to estimate. Very few algorithms even bother to calculate error envelopes on them. They are also subject to numerous systematic uncertainties which can play havoc with proper interpretation. But they are nevertheless extremely useful since they allow changes in coronal structures to be easily discerned, and observations with one instrument can be used to derive these DEMs and these can then be used to predict what is observable with some other instrument.

The flux at Earth due to an atomic transition u –> l from a volume element δV at a location ɼ,

I_ul(ɼ) = (1/4 π) (1/d(ɼ)²) A(Z,ɼ) G_ul(n_e(ɼ),T_e(ɼ)) n_e(ɼ)² δV(ɼ) ,

where n_e is the electron density and T_e is the temperature of the plasma, A(Z,ɼ) are the abundance of element Z, G_ul(n_e,T_e) is the atomic emissivity for the transition, and d is the distance to the source.

We can combine the flux from all the points in the field of view that arise from plasma at the same temperature,

I_ul(T_e) = (1/4 π) ∑_ɼ|T_e (1/d(ɼ)²) A(Z,ɼ) G_ul(n_e(ɼ),T_e) n_e²δV(ɼ) .

Assuming that A(Z,ɼ), n_e(ɼ) do not vary over the points in the summation,

I_ul(T_e) ≈ (1 / 4 π d²) G_ul(n_e,T_e) A(Z) n_e² (ΔV / Δlog T_e) Δlog T_e ,

and hence the total line flux due to emission at all temperatures,

I_ul = ∑_T_e (1 / 4 π d²) A(Z) G_ul(n_e,T_e) DEM(T_e) ΔlogT_e

The quantity

DEM(T_e) = n_e² (ΔV / Δlog T_e)

is called the Differential Emission Measure and is a very useful summary of the temperature structure of stellar coronae. It is typically reported in units of [cm^-3] (or [cm^-5] if ΔV is written out as area*Δh). Sometimes it is defined as n_e²(ΔV/ΔT) and has units [cm^-3K^-1].

The expression for the line flux is an instance of Fredholm’s Equation of the First Kind and the DEM(T_e) solution is thus unstable and subject to high-frequency oscillations. There is a whole industry that has grown up trying to derive DEMs from often highly unreliable datasets.

]]>

I Like Eq

vlk — Wed, 13 Aug 2008 16:59:54 +0000

I grew up in an environment that glamourized mathematical equations. Equations adorned a text like jewelry, set there to dazzle, and often to outshine the text that they were to illuminate. Needless to say, anything I wrote was dense, opaque, and didn’t communicate what it set out to. It was not until I saw a Reference Frame essay by David Mermin on how to write equations (1989, Physics Today, 42, p9) that I realized that equations should be treated as part of the text. You should be able to read them. David Mermin set out 3 rules for writing out equations, which I’ve tried to follow diligently (if not always successfully) since then.

Number or label all displayed equations (Fisher’s Rule):

The most common violation of Fisher’s rule is the misguided practice of numbering only those displayed equations to which the text subsequently refers back. … it is necessary to state emphatically that Fisher’s rule is for the benefit not of the author, but the reader.
For although you, dear author, may have no need to refer in your text to the equations you therefore left unnumbered, it is presumptuous to assume the same disposition in your readers. And although you may well have acquired the solipsistic habit of writing under the assumption that you will have no readers at all, you are wrong.
When referring to an equation within the text, identify it by a phrase as well as a number (aka the Good Samaritan Rule):

A Good Samaritan is compassionate and helpful to one in distress, and there is nothing more distressing than having to hunt your way back in a manuscript in search of Eq. (2.47), not because your subsequent progress requires you to inspect it in detail, but merely to find out what it is about so you may know the principles that go int othe construction of Eq. (7.38).
Punctuate the equation (aka the Math is Prose Rule):

The equations you display are embedded in your prose and constitute an inseparable part of it.

Regardless … of how to parse the equation internally, certain things are clear to anyone who understands the equation and the prose in which it is embedded.

We punctuate equations because they are a form of prose (they can, after all, be read aloud as a sequence of words) and are therefore subject to the same rules as any other prose. … punctuation makes them easier to read and often clarifies the discussion in which they occur. … viewing an equation not as a grammatically irrelevant blob, but as a part of the text … can only improve the fluency and grace of one’s expository mathematical prose.

]]>

Background Subtraction, the Sequel [Eqn]

vlk — Wed, 06 Aug 2008 17:00:39 +0000

As mentioned before, background subtraction plays a big role in astrophysical analyses. For a variety of reasons, it is not a good idea to subtract out background counts from source counts, especially in the low-counts Poisson regime. What Bayesians recommend instead is to set up a model for the intensity of the source and the background and to infer these intensities given the data.

For instance, suppose as before, that C counts are observed in a region of the image that overlaps a putative source, and B counts in an adjacent, non-overlapping region that is mostly devoid of the source and which is r times larger in area and exposure than the source region. Further suppose that a fraction f of the source falls in the so-called source region (typically, f~0.9) and a fraction g falls in the background region (we strive to make g~0). Then the observed counts can be written as Poisson realizations of intensities,

C = Poisson(φ_S) ≡ Poisson(f θ_S + θ_B) , and
B = Poisson(φ_B) ≡ Poisson(g θ_S + r θ_B) ,

where the subscripts denote the model intensities in the source (S) or background (B) regions.

The joint probability distribution of the model intensities,

p(φ_S φ_B | C B) dφ_S dφ_B

can be rewritten in terms of the interesting parameters by transforming the variables,

≡ p(θ_S θ_B | C B) J(φ_S φ_B ; θ_S θ_B) d θ_S d θ_B

where J(φ_S φ_B ; θ_S θ_B) is the Jacobian of the coordinate transformation, and thus

= p(θ_S θ_B | C B) (r f – g) d θ_S d θ_B .

The posterior probability distribution of the source intensity, θ_S, can be derived by marginalizing this over the background intensity parameter, θ_B. A number of people have done this calculation in the case f=1,g=0 (e.g., Loredo 1992, SCMA II, p275; see also van Dyk et al. 2001, ApJ 584, 224). The general case is slightly more convoluted, but is still a straightforward calculation (Kashyap et al. 2008, AAS-HEAD 9, 03.02); but more on that another time.

]]>

keV vs keV [Eqn]

vlk — Wed, 30 Jul 2008 17:00:20 +0000

I have noticed that our statistician collaborators are often confused by our units. (Not a surprise; I, too, am constantly confused by our units.) One of the biggest culprits is the unit of energy, [keV], which is 1000 electron Volts, for the energy acquired by an electron when it falls through an electric potential of 1 Volt:

1 [eV] ≡ 1.6021892 · 10^-19 [Joule] ≡ 1.6021892 · 10^-12 [erg] .

The confusion is because the same units are used to denote two separate quantities which happen to have similar magnitudes for a commonly encountered spectral model, Bremsstrahlung emission.

the frequency ν, or wavelength λ, of a photon: As Planck discovered, the energy of a photon is directly related to the frequency ν,

E = h · ν ≡ h · c / λ ,

where h=6.6261760 · 10^-27 [erg s] is Planck’s constant and c=2.9979246 · 10¹⁰ [cm s^-1] is the speed of light in vaccum. When λ is given in [Ångström] ≡ 10^-8 [cm], we can convert it as

[keV] = 12.398521 / [Å] ,

which is an extraordinarily useful thing to know in high-energy astrophysics.
the temperature T of a gas or plasma: Here we look to thermodynamics, which relates the kinetic energy of random motion of particles in a gas to a gross property, the temperature of the gas,

E = k_B · T ,

where k_B = 1.3806620 · 10^-16 [erg K^-1] is Boltzmann’s constant. Then, a temperature in degrees Kelvin can be written in units of keV by converting it with the formula

[keV] = 8.6173468 · 10^-8 · [K] ≡ 0.086173468 · [MK] .

It is tempting to put the two together and interpret a temperature as a photon energy. This is possible for the aforementioned Bremsstrahlung radiation, where plasma at a temperature T produces a spectrum of photons distributed as e^{-h ν / k_B T} and it is possible to tie the temperature to the photon energy at the point where the numerator and denominator have the same numerical value. For example, a 1 keV (temperature) Bremsstrahlung spectrum extends out to 1 keV (photon energy). X-ray Astronomers use this as shorthand all the time, and it confuses the hell out of everybody else.

]]>

The Banff Challenge [Eqn]

vlk — Wed, 23 Jul 2008 17:00:48 +0000

With the LHC coming on line anon, it is appropriate to highlight the Banff Challenge, which was designed as a way to figure out how to place bounds on the mass of the Higgs boson. The equations that were to be solved are quite general, and are in fact the first attempt that I know of where calibration data are directly and explicitly included in the analysis.

The observables are counts N, Y, and Z, with

N ~ Pois(ε λ_S + λ_B) ,
Y ~ Pois(ρ λ_B) ,
Z ~ Pois(ε υ) ,

where λ_S is the parameter of interest (in this case, the mass of the Higgs boson, but could be the intensity of a source), λ_B is the parameter that describes the background, ε is the efficiency, or the effective area, of the detector, and υ is a calibrator source with a known intensity.

The challenge was (is) to infer the maximum likelihood estimate of and the bounds on λ_S, given the observed data, {N, Y, Z}. In other words, to compute

p(λ_S|N,Y,Z) .

It may look like an easy problem, but it isn’t!

]]>

chi-square distribution [Eqn]

vlk — Wed, 16 Jul 2008 17:00:16 +0000

The Χ² distribution plays an incredibly important role in astronomical data analysis, but it is pretty much a black box to most astronomers. How many people know, for instance, that its form is exactly the same as the γ distribution? A Χ² distribution with ν degrees of freedom is

p(z|ν) = (1/Γ(ν/2)) (1/2)^ν/2 z^ν/2-1 e^-z/2 ≡ γ(z;ν/2,1/2) , where z=Χ².

Its more familiar usage is in the cumulative form, which is just the incomplete gamma function. This is where you count off how much area is enclosed in [0,Χ²) to tell at what point the 68%, 95%, etc., thresholds are met. For example, for ν=1,

∫₀^Z dx p(Χ²|ν=1) = 0.68 when Z=1.

This is the origin of the ΔΧ²=1 method to determine error bars on best-fit parameters.

]]>

Kaplan-Meier Estimator (Equation of the Week)

vlk — Wed, 09 Jul 2008 17:00:54 +0000

The Kaplan-Meier (K-M) estimator is the non-parametric maximum likelihood estimator of the survival probability of items in a sample. “Survival” here is a historical holdover because this method was first developed to estimate patient survival chances in medicine, but in general it can be thought of as a form of cumulative probability. It is of great importance in astronomy because so much of our data are limited and this estimator provides an excellent way to estimate the fraction of objects that may be below (or above) certain flux levels. The application of K-M to astronomy was explored in depth in the mid-80′s by Jurgen Schmitt (1985, ApJ, 293, 178), Feigelson & Nelson (1985, ApJ 293, 192), and Isobe, Feigelson, & Nelson (1986, ApJ 306, 490). [See also Hyunsook's primer.] It has been coded up and is available for use as part of the ASURV package.

Consider a simple case where you have N observations of the luminosities of a source. Let us say that all N sources have been detected and their luminosities are estimated to be L_i, i=1..N, and that they are ordered such that L_i < L_i+1 Then, it is easy to see that the fraction of sources above each L_i can be written as the sequence

{ N-1, N-2, N-3, … 2, 1, 0}/N

The K-M estimator is a generalized form that describes this sequence, and is written as a product. The probability that an object in the sample has luminosity greater than L_k is

S(L>L₁) = (N-1)/N
S(L>L₂) = (N-1)/N * ((N-1)-1)/(N-1) = (N-1)/N * (N-2)/(N-1) = (N-2)/N
S(L>L₃) = (N-1)/N * ((N-1)-1)/(N-1) * ((N-2)-1)/(N-2) = (N-3)/N
…
S(L>L_k) = Π_i=1..k (n_i-1)/n_i = (N-k)/N

where n_k are the number of objects still remaining at luminosity level L ≥ L_k, and at each stage one object is decremented to account for the drop in the sample size.

Now that was for the case when all the objects are detected. But now suppose some are not, and only upper limits to their luminosities are available. A specific value of L cannot be assigned to these objects, and the only thing we can say is that they will “drop out” of the set at some stage. In other words, the sample will be “censored”. The K-M estimator is easily altered to account for this, by changing the decrement in each term of the product to include the censored points. Thus, the general K-M estimator is

S(L>L_k) = Π_i=1..k (n_i-c_i)/n_i

where c_i are the number of objects that drop out between L_i-1 and L_i.

Note that the K-M estimator is a maximum likelihood estimator of the cumulative probability (actually one minus the cumulative probability as it is usually understood), and uncertainties on it must be estimated via Monte Carlo or bootstrap techniques [or not.. see below].

]]>