Thankfully, its usage is robustly built into analysis software such as Sherpa or XSPEC and most people don’t have to deal with the nitty gritty on a daily basis. But given the profusion of statistical software being written for astronomers, it is perhaps useful to go over what it means.
The Redistribution Matrix File (RMF) is, at its most basic, a description of how the detector responds to incoming photons. It describes the transformation from the photons that are impinging on the detector to the counts that are recorded by the instrument electronics. Ideally, one would want there to be a one-to-one mapping between the photon’s incoming energy and the recorded energy, but in the real world detectors are not ideal. The process of measuring the energy introduces a measurement error, which is encoded as the probability that incoming photons at energy E are read out in detector channels i. Thus, for each energy E, there results an array of probabilities p(i|E) such that the observed counts in channel i,
$$c_i|d_E \sim {\rm Poisson}(p(i|E) \cdot d_E) \,,$$
where dE is the expected counts at energy E, and is the product of the source flux at the telescope and the effective area of the telescope+detector combination. Equivalently, the expected counts in channel i,
$${\rm E}(c_i|d_E) = p(i|E) \cdot d_E \,.$$
The full format of how the arrays p(i|E) are stored in files is described in a HEASARC memo, CAL/GEN/92-002a. Briefly, it is a FITS file with two tables, of which only the first one really matters. This first table (“SPECRESP MATRIX”) contains the energy grid boundaries {Ej; j=1..NE} where each entry j corresponds to one set of p(i|Ej). The arrays themselves are stored in compressed form, as the smallest possible array that excludes all the zeros. An ideal detector, where $$p(i|E_j) \equiv \delta_{ij}$$ would be compressed to a matrix of size NE × 1. The FITS extension also contains additional arrays to help uncompress the matrix, such as the index of the first non-zero element and the number of non-zero elements for each p(i|Ej).
The second extension (“EBOUNDS”) contains an energy grid {ei; i=1..Nchannels} that maps to the channels i. This grid is fake! Do not use it for anything except display purposes or for convenient shorthand! What it is is a mapping of the average detector gain to the true energy, such that it lists the most likely energy of the photons registered in that bin. This grid allows astronomers to specify filters to the spectrum in convenient units that are semi-invariant across instruments (such as [Å] or [keV]) rather than detector channel numbers, which are unique to each instrument. But keep in mind, this is a convenient fiction, and should never be taken seriously. It is useful when the width of p(i|E) spans only a few channels, and completely useless for lower-resolution detectors.
]]>The blackbody spectrum is
$$B_{\nu}(T) = \frac{2 h \nu^3}{c^2} \frac{1}{e^{h \nu / k_B T} – 1} ~~ {\rm [erg~s^{-1}~cm^{-2}~Hz^{-1}~sr^{-1}]} \,,$$
where ν is the frequency in [Hz], h is Planck’s constant, c is the speed of light in vacuum, and kB is Boltzmann’s constant. The spectrum is interesting in many ways. Its shape is characterized by only one parameter, the radiation temperature T. A spectrum with a higher T is greater in intensity at all frequencies compared to one with a lower T, and the integral over all frequencies is σ T4, where $$\sigma \equiv \frac{2\pi^5k_B^4}{15 c^2 h^3}$$ is the Stefan-Boltzmann constant. Other than that, the normalization is detached, so to speak, from T, and differences in source luminosities are entirely attributable to differences in emission surface area.
The general shape of a blackbody spectrum is like a rising parabola at low ν (which led to much hand-wringing in the late 19th century about the Ultraviolet Catastrophe) and an exponential drop at high ν, with a well-defined peak in between. The frequency at which the spectrum peaks is dependent on the temperature, with
$$\nu_{\rm max} = 2.82 \frac{k_B T}{h}$$,
or equivalently,
$$ \lambda_{\rm max} = \frac{2.9\cdot10^4}{T} ~~ {\rm[\AA]} \,,$$
where T is in [degK].
For historical reasons, astronomers measure the brightness of celestial objects in rank order. The smaller the rank number, aka magnitude, the brighter the object. Thus, a star of the first magnitude is much brighter than a star of the sixth magnitude, and it would take exceptionally good eyes and a very dark sky to see a star of the seventh magnitude. Now, it turns out that the human eye perceives brightness on a log scale, so magnitudes are numerically similar to log(brightness). And because they are a ranking list, it is always with reference to a standard. After some rough calibration to match human perception to true brightness of stars in the night sky, we have a formal definition for magnitude,
$$m = – \frac{5}{2}\log_{10}\left(\frac{f_{object}}{f_{standard}}\right) \,,$$
where fobject is the flux from the object and fstandard is the flux from a fiducial standard. In the optical bands, the bright star Vega (α Lyrae) has been adopted as the standard, and has magnitudes of 0 in all optical filters. (Well, not exactly because Vega is not constant enough, and as a practical matter there is nowadays a hierarchy of photometric standard stars that are accessible at different parts of the sky.) Note that we can also write this in terms of the intrinsic luminosity Lobject of the object and the distance d to it,
$$m = – \frac{5}{2}\log_{10}\left(\frac{L_{object}}{4 \pi d^2}\frac{1}{f_{standard}}\right) \,.$$
Because astronomical objects are located at a vast variety of distances, it is useful to define an intrinsic magnitude of the object, independent of the distance. Thus, in contrast to the apparent magnitude m, which is the brightness at Earth, an absolute magnitude is defined as the brightness that would be perceived if the object were 10 parsecs away,
$$M \equiv m|_{d={\rm 10~pc}} = m – \frac{5}{2}\log_{10}\left(\frac{d^2}{{\rm (10~pc)}^2}\right) \equiv m – 5\log_{10}d + 5$$
where d is the distance to the object in [parsec], and the squared term is of course because of the inverse square law.
There are other issues such as interstellar absorption, cosmological corrections, extent of the source, etc., but let’s not complicate it too much right away.
Colors are differences in the magnitudes in different passbands. For instance, if the apparent magnitude in the blue filter is mB and in the green filter is mV (V for “visual”), the color is mB-mV and is usually referred to as “B-V” color. It is the difference in magnitudes, and is related to the log ratio of the intensities.
For an excellent description of what is involved in the measurement of magnitudes and colors, see this article on analyzing photometric data by Star Stryder.
]]>The flux at Earth due to an atomic transition u –> l from a volume element δV at a location ɼ,
Iul(ɼ) = (1/4 π) (1/d(ɼ)2) A(Z,ɼ) Gul(ne(ɼ),Te(ɼ)) ne(ɼ)2 δV(ɼ) ,
where ne is the electron density and Te is the temperature of the plasma, A(Z,ɼ) are the abundance of element Z, Gul(ne,Te) is the atomic emissivity for the transition, and d is the distance to the source.
We can combine the flux from all the points in the field of view that arise from plasma at the same temperature,
Iul(Te) = (1/4 π) ∑ɼ|Te (1/d(ɼ)2) A(Z,ɼ) Gul(ne(ɼ),Te) ne2δV(ɼ) .
Assuming that A(Z,ɼ), ne(ɼ) do not vary over the points in the summation,
Iul(Te) ≈ (1 / 4 π d2) Gul(ne,Te) A(Z) ne2 (ΔV / Δlog Te) Δlog Te ,
and hence the total line flux due to emission at all temperatures,
Iul = ∑Te (1 / 4 π d2) A(Z) Gul(ne,Te) DEM(Te) ΔlogTe
The quantity
DEM(Te) = ne2 (ΔV / Δlog Te)
is called the Differential Emission Measure and is a very useful summary of the temperature structure of stellar coronae. It is typically reported in units of [cm-3] (or [cm-5] if ΔV is written out as area*Δh). Sometimes it is defined as ne2(ΔV/ΔT) and has units [cm-3K-1].
The expression for the line flux is an instance of Fredholm’s Equation of the First Kind and the DEM(Te) solution is thus unstable and subject to high-frequency oscillations. There is a whole industry that has grown up trying to derive DEMs from often highly unreliable datasets.
]]>The most common violation of Fisher’s rule is the misguided practice of numbering only those displayed equations to which the text subsequently refers back. … it is necessary to state emphatically that Fisher’s rule is for the benefit not of the author, but the reader.
For although you, dear author, may have no need to refer in your text to the equations you therefore left unnumbered, it is presumptuous to assume the same disposition in your readers. And although you may well have acquired the solipsistic habit of writing under the assumption that you will have no readers at all, you are wrong.
A Good Samaritan is compassionate and helpful to one in distress, and there is nothing more distressing than having to hunt your way back in a manuscript in search of Eq. (2.47), not because your subsequent progress requires you to inspect it in detail, but merely to find out what it is about so you may know the principles that go int othe construction of Eq. (7.38).
The equations you display are embedded in your prose and constitute an inseparable part of it.
Regardless … of how to parse the equation internally, certain things are clear to anyone who understands the equation and the prose in which it is embedded.
We punctuate equations because they are a form of prose (they can, after all, be read aloud as a sequence of words) and are therefore subject to the same rules as any other prose. … punctuation makes them easier to read and often clarifies the discussion in which they occur. … viewing an equation not as a grammatically irrelevant blob, but as a part of the text … can only improve the fluency and grace of one’s expository mathematical prose.
For instance, suppose as before, that C counts are observed in a region of the image that overlaps a putative source, and B counts in an adjacent, non-overlapping region that is mostly devoid of the source and which is r times larger in area and exposure than the source region. Further suppose that a fraction f of the source falls in the so-called source region (typically, f~0.9) and a fraction g falls in the background region (we strive to make g~0). Then the observed counts can be written as Poisson realizations of intensities,
C = Poisson(φS) ≡ Poisson(f θS + θB) , and
B = Poisson(φB) ≡ Poisson(g θS + r θB) ,
where the subscripts denote the model intensities in the source (S) or background (B) regions.
The joint probability distribution of the model intensities,
p(φS φB | C B) dφS dφB
can be rewritten in terms of the interesting parameters by transforming the variables,
≡ p(θS θB | C B) J(φS φB ; θS θB) d θS d θB
where J(φS φB ; θS θB) is the Jacobian of the coordinate transformation, and thus
= p(θS θB | C B) (r f – g) d θS d θB .
The posterior probability distribution of the source intensity, θS, can be derived by marginalizing this over the background intensity parameter, θB. A number of people have done this calculation in the case f=1,g=0 (e.g., Loredo 1992, SCMA II, p275; see also van Dyk et al. 2001, ApJ 584, 224). The general case is slightly more convoluted, but is still a straightforward calculation (Kashyap et al. 2008, AAS-HEAD 9, 03.02); but more on that another time.
]]>1 [eV] ≡ 1.6021892 · 10-19 [Joule] ≡ 1.6021892 · 10-12 [erg] .
The confusion is because the same units are used to denote two separate quantities which happen to have similar magnitudes for a commonly encountered spectral model, Bremsstrahlung emission.
E = h · ν ≡ h · c / λ ,
where h=6.6261760 · 10-27 [erg s] is Planck’s constant and c=2.9979246 · 1010 [cm s-1] is the speed of light in vaccum. When λ is given in [Ångström] ≡ 10-8 [cm], we can convert it as
[keV] = 12.398521 / [Å] ,
which is an extraordinarily useful thing to know in high-energy astrophysics.
E = kB · T ,
where kB = 1.3806620 · 10-16 [erg K-1] is Boltzmann’s constant. Then, a temperature in degrees Kelvin can be written in units of keV by converting it with the formula
[keV] = 8.6173468 · 10-8 · [K] ≡ 0.086173468 · [MK] .
It is tempting to put the two together and interpret a temperature as a photon energy. This is possible for the aforementioned Bremsstrahlung radiation, where plasma at a temperature T produces a spectrum of photons distributed as e-h ν / kB T and it is possible to tie the temperature to the photon energy at the point where the numerator and denominator have the same numerical value. For example, a 1 keV (temperature) Bremsstrahlung spectrum extends out to 1 keV (photon energy). X-ray Astronomers use this as shorthand all the time, and it confuses the hell out of everybody else.
]]>The observables are counts N, Y, and Z, with
N ~ Pois(ε λS + λB) ,
Y ~ Pois(ρ λB) ,
Z ~ Pois(ε υ) ,
where λS is the parameter of interest (in this case, the mass of the Higgs boson, but could be the intensity of a source), λB is the parameter that describes the background, ε is the efficiency, or the effective area, of the detector, and υ is a calibrator source with a known intensity.
The challenge was (is) to infer the maximum likelihood estimate of and the bounds on λS, given the observed data, {N, Y, Z}. In other words, to compute
p(λS|N,Y,Z) .
It may look like an easy problem, but it isn’t!
]]>p(z|ν) = (1/Γ(ν/2)) (1/2)ν/2 zν/2-1 e-z/2 ≡ γ(z;ν/2,1/2) , where z=Χ2.
Its more familiar usage is in the cumulative form, which is just the incomplete gamma function. This is where you count off how much area is enclosed in [0,Χ2) to tell at what point the 68%, 95%, etc., thresholds are met. For example, for ν=1,
∫0Z dx p(Χ2|ν=1) = 0.68 when Z=1.
This is the origin of the ΔΧ2=1 method to determine error bars on best-fit parameters.
]]>Consider a simple case where you have N observations of the luminosities of a source. Let us say that all N sources have been detected and their luminosities are estimated to be Li, i=1..N, and that they are ordered such that Li < Li+1 Then, it is easy to see that the fraction of sources above each Li can be written as the sequence
{ N-1, N-2, N-3, … 2, 1, 0}/N
The K-M estimator is a generalized form that describes this sequence, and is written as a product. The probability that an object in the sample has luminosity greater than Lk is
S(L>L1) = (N-1)/N
S(L>L2) = (N-1)/N * ((N-1)-1)/(N-1) = (N-1)/N * (N-2)/(N-1) = (N-2)/N
S(L>L3) = (N-1)/N * ((N-1)-1)/(N-1) * ((N-2)-1)/(N-2) = (N-3)/N
…
S(L>Lk) = Πi=1..k (ni-1)/ni = (N-k)/N
where nk are the number of objects still remaining at luminosity level L ≥ Lk, and at each stage one object is decremented to account for the drop in the sample size.
Now that was for the case when all the objects are detected. But now suppose some are not, and only upper limits to their luminosities are available. A specific value of L cannot be assigned to these objects, and the only thing we can say is that they will “drop out” of the set at some stage. In other words, the sample will be “censored”. The K-M estimator is easily altered to account for this, by changing the decrement in each term of the product to include the censored points. Thus, the general K-M estimator is
S(L>Lk) = Πi=1..k (ni-ci)/ni
where ci are the number of objects that drop out between Li-1 and Li.
Note that the K-M estimator is a maximum likelihood estimator of the cumulative probability (actually one minus the cumulative probability as it is usually understood), and uncertainties on it must be estimated via Monte Carlo or bootstrap techniques [or not.. see below].
]]>