2010-apr-30: Aneta has setup a blogspot site to deal with simple Sherpa techniques and tactics: http://pysherpa.blogspot.com/
On Help:
ahelp "something"
(note the quotes)? wildcard
to get a list of all commands that include the wildcard
Data I/O:
load_pha()
load_bkg()
set_bkg('src',get_data('bkg'))
, so counts from a different file can be assigned as background to the current spectrum.load_arf()
load_bkg_arf()
for the background — should it be done before or after load_bkg()
, or does it matter?load_rmf()
load_bkg_rmf()
for the background, and same question as aboveprint(get_data())
and print(get_bkg())
notice_id()
subtract()
unsubtract()
plot_data(), plot_bkg(), plot_arf(), plot_model(), plot_fit(),
etc.plot_data(1,yerr=0)
should do the trick, but it appears to be still in the development version.)Fitting:
fit()
projection()
(or covar()
, but why deliberately use a function that is guaranteed to underestimate your error bars?)set_source(ModelName.ModelID+AnotherModel.ModelID2)
(where you can distinguish between different instances of the same type of model using the ModelID — e.g., set_source(xsphabs.abs1*powlaw1d.psrc+powlaw1d.pbkg)
)print(get_model())
chisq
types, cstat,
and cash
)set_method()
(options are levmar, moncar, neldermead, simann, simplex
)v1:2007-dec-18
v2:2008-feb-20
v3:2010-apr-30
]]>The biggest challenge for a statistician to use astronomical data is the lack of mercy for nonspecialists’ accessing data including format, quantification, and qualification[2] ; and data analysis systems. IDL is costly although it is used in many disciplines and other tools in astronomy are hardly utilized for different projects.[3] In that regards, I welcome astronomers using python to break such exclusiveness in astronomical data analysis systems.
Even if data and software issues are resolved, there’s another barrier to climb. Validation. If you have a catalog, you’ll see variables of measures, and their errors typically reflecting the size of PSF and its convolution to those metrics. If a model of gaussian assumption applied, in order to tabulate power law index, King’s, Petrosian’s, or de Vaucouleurs’ profile index, and numerous metrics, I often fail to find any validation of gaussian assumptions, gaussian residuals, spectral and profile models, outliers, and optimal binning. Even if a data set is publicly available, I also fail to find how to read in raw data, what factors must be considered, and what can be discarded because of unexpected contamination occurred like cosmic rays and charge over flows. How would I validate raw data that are read into a data analysis system is correctly processed to match values in catalogs? How would I know all entries in catalog are ready for further scientific data analysis? Are those sources real? Is p-value appropriately computed?
I posted an article about Chernoff faces applied to Capella observations from Chandra. Astronomers already processed the raw data and published a catalog of X-ray spectra. Therefore, I believe that the information in the catalog is validated and ready to be used for scientific data analysis. I heard that repeated Capella observation is for the calibration. Generally speaking, in other fields, targets for calibration are almost time invariant and exhibit consistency. If Capella is a same star over the 10 years, the faces in my post should look almost same, within measurement error; but as you saw, it was not consistent at all. Those faces look like observations were made toward different objects. So far I fail to find any validation efforts, explaining why certain ObsIDs of Capella look different than the rest. Are they real Capella? Or can I use this inconsistent facial expression as an evidence that Chandra calibration at that time is inappropriate? Or can I conclude that Capella was a wrong choice for calibration?
Due to the lack of quantification procedure description from the raw data to the catalog, what I decided to do was accessing the raw data and data processing on my own to crosscheck the validity in the catalog entries. The benefit of this effort is that I can easily manipulate data for further statistical inference. Although reading and processing raw data may sound easy, I came across another problem, lack of documentation for nonspecialists to perform the task.
A while ago, I talked about read.table() in R. There are slight different commands and options but without much hurdle, one can read in ascii data in various styles easily with read.table() for exploratory data analysis and confirmatory data analysis with R. From my understanding, statisticians do not spend much time on reading in data nor collecting them. We are interested in methodology to extract information of the population based on sample. While the focus is methodology, all the frustrations with astronomical data analysis softwares occur prior to investigating the best method. The level of frustration reached to the extend of terminating my eagerness for more investigation about inference tools.
In order to assess those Capella observations, thanks to its on-site help, I evoke ciao. Beforehand, I’d like to disclaim that I exemplify ciao to illustrate the culture difference that I experienced as a statistician. It was used to discuss why I think that astronomical data analysis systems are short of documentations and why that astronomical data processing procedures are lack of validation. I must say that I confront very similar problems when I tried to learn astronomical packages such as IRAF and AIPS. Ciao happened to be at handy when writing this post.
In order to understand X-ray data, not only image data files, one also needs effective area (arf), redistribution matrix (rmf), and point spread function (psf). These files are called by calibration data files. If the package was developed for general users, like read.table() I expect there should be a homogenized/centralized data including calibration data reading function with options. Instead, there were various kinds of functions one can use to read in data but the description was not enough to know which one is doing what. What is the functionality of these commands? Which one only stores names of data file? Which one reconfigures the raw data reflecting up to date calibration file? Not knowing complete data structures and classes within ciao, not getting the exact functionality of these data reading functions from ahelp, I was not sure the log likelihood that I computed is appropriate or not.
For example, there are five different ways to associate an arf: read_arf(), load_arf(), set_arf(), get_arf(), and unpack_arf() from ciao. Except unpack_arf(), I couldn’t understand the difference among these functions for accessing an arf[4] Other softwares including XSPEC that I use, in general, have a single function with options to execute different level of reading in data. Ciao has an extensive web documentation without a tutorial (see my post). So I read all ahelp “commands” a few times. But I still couldn’t decide which one to use for my work to read in arfs and rmfs (I happened to have many calibration data files).
arf | rmf | psf | pha | data | |
---|---|---|---|---|---|
get | get_arf | get_rmf | get_psf | get_pha | get_data |
set | set_arf | set_rmf | set_psf | set_pha | set_data |
unpack | unpack_arf | unpack_rmf | unpack_psf | unpack_pha | unpack_data |
load | load_arf | load_rmf | load_psf | load_pha | load_data |
read | read_arf | read_rmf | read_psf | read_pha | read_data |
[Note that above links may not work since ciao documentation website evolves quickly. Some might be routed to different links so please, check this website for other data reading commands: cxc.harvard.edu/sherpa/ahelp/index_alphabet.html].
So, I decide to seek for a help through cxc help desk several months back. Their answers are very reliable and prompt. My question was “what are the difference among read_xxx(), load_xxx(), set_xxx(), get_xxx(), and unpack_xxx(), where xxx can be data, arf, rmf, and psf?” The answer to this question was that
You can find detailed explanations for these Sherpa commands in the “ahelp” pages of the Sherpa website:
http://cxc.harvard.edu/sherpa/ahelp/index_alphabet.html
This is a good answer but a big cultural shock to a statistician. It’s like having an answer like “check http://www.r-project.org/search.html and http://cran.r-project.org/doc/FAQ/R-FAQ.html” for IDL users to find out the difference between read.table() and scan(). Probably, for astronomers, all above various data reading commands are self explanatory like R having read.table(), read.csv(), and scan(). Disappointingly, this answer was not I was looking for.
Well, thanks to this embezzlement, hesitation, and some skepticism, I couldn’t move to the next step of implementing fitting methods. At the beginning, I was optimistic when I found out that Ciao 4.0 and up is python compatible. I thought I could do things more in statistically rigorous ways since I can fake spectra to validate my fitting methods. I was thinking about modifying the indispensable chi-square method that is used twice for point estimation and hypothesis testing that introduce bias (a link made to a posting). My goal was make it less biased and robust, less sensitive iid Gaussian residual assumptions. Against my high expectation, I became frustrated at the first step, reading and playing with data to get a better sense and to develop a quick intuition. I couldn’t even make a baby step to my goal. I’m not sure if it a good thing or not, but I haven’t been completely discouraged. Also, time helps gradually to overcome this culture difference, the lack of documentation.
What happens in general is that, if a predecessor says, use “set_arf(),” then the apprentice will use “set_arf()” without doubts. If you begin learning on your own purely relying on documentations, I guess at some point you have to make a choice. One can make a lucky guess and move forward quickly. Sometimes, one can land on miserable situation because one is not sure about his/her choice and one cannot trust the features appeared after these processing. I guess it is natural to feel curiosity about what each of these commands is doing to your data and what information is carried over to the next commands in analysis procedures. It seems righteous to know what command is best for the particular data processing and statistical inference given the data. What I found is that such comparison across commands is missing in documentations. This is why I thought astronomical data analysis systems are short of mercy for nonspecialists.
Another thing I observed is that there seems no documentation nor standard procedure to create the repeatable data analysis results. My observation of astronomers says that with the same raw data, the results by scientist A and B are different (even beyond statistical margins). There are experts and they have knowledge to explain why results are different on the same raw data. However, not every one can have luxury of consulting those few experts. I cannot understand such exclusiveness instead of standardizing the procedures through validations. I even saw that the data that A analyzed some years back can be different from this year’s when he/she writes a new proposal. I think that the time for recreating the data processing and inference procedure to explain/justify/validate the different results or to cover/narrow the gap could have not been wasted if there are standard procedures and its documentation. This is purely a statistician’s thought. As the comment in where is ciao X?[5] not every data analysis system has to have similar design and goals.
Getting lost while figuring out basics (handling, arf, rmf, psf, and numerous case by case corrections) prior to applying any simple statistics has been my biggest obstacle in learning astronomy. The lack of documenting validation methods often brings me frustration. I wonder if there’s any astronomers who lost in learning statistics via R, minitab, SAS, MATLAB, python, etc. As discussed in where is ciao X? I wish there is a centralized tutorial that offers basics, like how to read in data, how to do manipulate datum vector and matrix, how to do arithmetics and error propagation adequately not violating assumptions in statistics (I don’t like the fact that the point estimate of background level is subtracted from observed counts, random variable when the distribution does not permit such scale parameter shifting), how to handle and manipulate fits format files from Chandra for various statistical analysis, how to do basic image analysis, how to do basic spectral analysis, and so on with references[6]
I’ve heard many times about the lack of documentation of this extensive data analysis system, ciao. I saw people still using ciao 3.4 although the new version 4 has been available for many months. Although ciao is not the only tool for Chandra data analysis, it was specifically designed for it. Therefore, I expect it being used frequently with popularity. But the reality is against my expectation. Whatever (fierce) discussion I’ve heard, it has been irrelevant to me because ciao is not intended for statistical analysis. Then, out of sudden, after many months, a realization hit me. ciao is different from other data analysis systems and softwares. This difference has been a hampering factor of introducing ciao outside the Chandra scientist community and of gaining popularity. This difference was the reason I often got lost in finding suitable documentations.
http://cxc.harvard.edu/ciao/ is the website to refer when you start using ciao and manuals are listed here, manuals and memos. The aforementioned difference is that I’m used to see Introduction, Primer, Tutorial, Guide for Beginners at the front page or the manual websites but not from the ciao websites. From these introductory documentations, I can stretch out to other specific topics, modules, tool boxes, packages, libraries, plug-ins, add-ons, applications, etc. Tutorials are the inertia of learning and utilizing data analysis systems. However, the layout of ciao manual websites seems not intended for beginners. It was hard to find basics when some specific tasks with ciao and its tools got stuck. It might be useful only for Chandra scientists who have been using ciao for a long time as references but not beyond. It could be handy for experts instructing novices by working side by side so that they can give better hands-on instruction.
I’ll contrast with other popular data analysis systems and software.
Even if I’ve been navigating the ciao website and its threads high in volume so many times, I only come to realize now that there’s no beginner’s guide to be called as ciao cookbook, ciao tutorial, ciao primer, ciao primer, ciao for dummies, or introduction to ciao at the visible location.
This is a cultural difference. Personal thought is that this tradition prevents none Chandra scientists from using data in the Chandra archive. A good news is that there has been ciao workshops and materials from the workshops are still available. I believe compiling these materials in a fashion that other beginners’ guides introducing the data analysis system can be a good starting point for writing up a front-page worthy tutorial. The existence of this introductory material could embrace more people to use and to explore Chandra X-ray data. I hope these tutorials from other softwares and data analysis systems (primer, cookbook, introduction, tutorial, or ciao for dummies) can be good guide lines to fully compose a ciao primer.
]]>SUBJECT(bayes) CONTEXT(sherpa)
SYNOPSIS
A Bayesian maximum likelihood function.
Maximum likelihood function is common for both Bayesian and frequentist methods. I don’t know get the point why “Bayesian” is particularly added with “maximum likelihood function.”
DESCRIPTION
(snip)
We can relate this likelihood to the Bayesian posterior density for S(i) and B(i)
using Bayes’ Theorem:p[S(i),B(i) | N(i,S)] = p[S(i)|B(i)] * p[B(i)] * p[N(i,S) | S(i),B(i)] / p[D] .
The factor p[S(i)|B(i)] is the Bayesian prior probability for the source model
amplitude, which is assumed to be constant, and p[D] is an ignorable normalization
constant. The prior probability p[B(i)] is treated differently; we can specify it
using the posterior probability for B(i) off-source:p[B(i)] = [ A (A B(i))^N(i,B) / N(i,B)! ] * exp[-A B(i)] ,
where A is an “area” factor that rescales the number of predicted background
counts B(i) to the off-source region.IMPORTANT: this formula is derived assuming that the background is constant as a
function of spatial area, time, etc. If the background is not constant, the Bayes
function should not be used.
Why not? If I rephrase it, what it said is that B(i) is a constant. Then why one bothers to write p[B(i)], a probability density of a constant? The statement sounds self contradictory to me. I guess B(i) is a constant parameter. It would be suitable to write that Background is homogeneous and the Background is describable with homogeneous Poisson process if the above pdf is a correct model for Background. Also, a slight notation change is required. Assuming the Poisson process, we could estimate the background rate (constant parameter) and its density p[B(i)], and this estimate is a constant as stated for p[S(i)|B(i)], a prior probability for the constant source model amplitude.
I think the reason for “Bayes should not used” is that the current sherpa is not capable of executing hierarchical modeling. Nevertheless, I believe one can script the MCMC methodologies with S-Lang/Python to be aggregated with existing sherpa tools to incorporate a possible space dependent density, p[B(i,x,y)]. I was told that currently a constant background regardless of locations and background subtraction is commonly practiced.
To take into account all possible values of B(i), we integrate, or marginalize,
the posterior density p[S(i),B(i) | N(i,S)] over all allowed values of B(i):p[S(i) | N(i,S)] = (integral)_0^(infinity) p[S(i),B(i) | N(i,S)] dB(i) .
For the constant background case, this integral may be done analytically. We do
not show the final result here; see Loredo. The function -log p[S(i)|N(i,S)] is
minimized to find the best-fit value of S(i). The magnitude of this function
depends upon the number of bins included in the fit and the values of the data
themselves. Hence one cannot analytically assign a `goodness-of-fit’ measure to a
given value of this function. Such a measure can, in principle, be computed by
performing Monte Carlo simulations. One would repeatedly sample new datasets from
the best-fit model, and fit them, and note where the observed function minimum
lies within the derived distribution of minima. (The ability to perform Monte
Carlo simulations is a feature that will be included in a future version of
Sherpa.)Note on Background Subtraction
Bayesian computation means one way or the other that one is able to get posterior distributions in the presence of various parameters regardless of their kinds: source or background. I wonder why there’s a discrimination such that source parameter has uncertainty whereas the background is constant and is subtracted (yet marginalization is emulated by subtracting different background counts with corresponding weights). It fell awkward to me. Background counts as well as source counts are Poisson random. I would like to know what justifies constant background while one uses probabilistic approaches via Bayesian methods. I would like to know why the mixture model approach – a mixture of source model and background model with marginalization over background by treating B(i) as a nuisance parameter – has not been tried. By casting eye sights broadly on Bayesian modeling methods and basics of probability, more robustly estimating the source model and their parameters is tractable without subtracting background prior to fitting a source model.
The background should not be subtracted from the data when this function is used
The background only needs to be specified, as in this example:
(snip)EXAMPLES
EXAMPLE 1
Specify the fitting statistic and then confirm it has been set. The method is then
changed from “Levenberg-Marquardt” (the default), since this statistic does not
work with that algorithm.sherpa> STATISTIC BAYES
sherpa> SHOW STATISTIC
Statistic: Bayes
sherpa> METHOD POWELL
(snip)
I would like to know why it’s not working with Levenberg-Marquardt (LM) but working with Powell. Any references that explain why LM does not work with Bayes?
I do look forward your comments and references, particularly reasons for Bayesian maximum likelihood function and Bugs with LM. Also, I look forward to see off the norm approaches such as modeling fully in Bayesian ways (like van Dyk et al. 2001, yet I see its application rarely) or marginalizing Background without subtraction but simultaneously fitting the source model. There are plenty of rooms to be improved in source model fitting under contamination and distortion of x-ray photon incidents through space, telescope, and signal transmission.
The short answer is, it has a convenient background subtractor built in, is analytically comprehensible, and uses concepts very familiar to astronomers. The last bit can be seen by appealing to Gaussian smoothing. Astronomers are (or were) used to smoothing images with Gaussians, and in a manner of speaking, all astronomical images already come presmoothed by PSFs (point spread functions) that are nominally approximated by Gaussians. Now, if an image were smoothed by another Gaussian of a slightly larger width, the difference between the two smoothed images should highlight those features which are prominent at the spatial scale of the larger Gaussian. This is the basic rationale behind a wavelet.
So, in the following, G(x,y;σx,σy,xo,yo) is a 2D Gaussian written in such that the scaling of the widths and the transposition of the function is made obvious. It is defined over the real plane x,y ε R2 and for widths σx,σy. The Mexican Hat wavelet MH(x,y;σx,σy,xo,yo) is generated as the difference between the two Gaussians of different widths, which essentially boils down to taking partial derivatives of G(σx,σy) wrt the widths. To be sure, these must really be thought of as operators where the functions are correlated with a data image, so the derivaties must be carried out inside an integral, but I am skipping all that for the sake of clarity. Also note, the MH is sometimes derived as the second derivative of G(x,y), the spatial derivatives that is.
The integral of the MH over R2 results in the positive bump and the negative annulus canceling each other out, so there is no unambiguous way to set its normalization. And finally, the Fourier Transform shows which spatial scales (kx,y are wavenumbers) are enhanced or filtered during a correlation.
]]>When C counts are observed in a region of the image that overlaps a putative source, and B counts in an adjacent, non-overlapping region that is mostly devoid of the source, the question that is asked is, what is the intensity of a source that might exist in the source region, given that there is also background. Let us say that the source has intensity s, and the background has intensity b in the first region. Further let a fraction f of the source overlap that region, and a fraction g overlap the adjacent, “background” region. Then, if the area of the background region is r times larger, we can solve for s and b and even determine the errors:
Note that the regions do not have to be circular, nor does the source have to be centered in it. As long as the PSF fractions f and g can be calculated, these formulae can be applied. In practice, f is large, typically around 0.9, and the background region is chosen as an annulus centered on the source region, with g~0.
It always comes as a shock to statisticians, but this is not ancient history. We still determine maximum likelihood estimates of source intensities by subtracting out an estimated background and propagate error by the method of moments. To be sure, astronomers are well aware that these formulae are valid only in the high counts regime ( s,C,B>>1, b>0 ) and when the source is well defined ( f~1, g~0 ), though of course it doesn’t stop them from pushing the envelope. This, in fact, is the basis of many standard X-ray source detection algorithms (e.g., celldetect).
Furthermore, it might come as a surprise to many astronomers, but this is also the rationale behind the widely-used wavelet-based source detection algorithm, wavdetect. The Mexican Hat wavelet used with it has a central positive bump, surrounded by a negative annular moat, which is a dead ringer for the source and background regions used here. The difference is that the source intensity is not deduced from the wavelet correlations and the signal-to-noise ratio ( s/sigmas ) is not used to determine source significance, but rather extensive simulations are used to calibrate it.
]]>Because wavdetect is an extremely complex algorithm, it is not possible to determine its power easily, and large numbers of simulations must be done. In December 2001 (has it been that long?!), the ChaMP folks held a mini workshop, and I reported on my investigations into the efficiency of wavdetect, calculating the probability of detecting a source in various circumstances. I had been meaning to get it published, but this is not really publishable by itself, and furthermore, wavdetect has evolved sufficiently since then that the actual numbers are obsolete. So here, strictly for entertainment, are the slides from that long ago talk. As Xiao-li puts it, “nothing will go waste!”
]]>