The bayesstack module takes as inputs a structured (or record) array containing counts from source and background regions along with key observation-related parameters (exposures, effective and actual areas, energy conversion factors, etc.). It then uses these to perform a stacking-type analysis using a hierarchical Bayesian model for the observed counts. This yields estimates of the mean log-flux or log-luminosity for the observed sources, along with estimates of the uncertainty on this quantity.
Our model assumes that log-luminosities (or log-fluxes) follow a t-distribution with a fixed, low degrees of freedom (the default is 2). This makes our method very robust to outliers, as the tails of such a distribution go to zero as
. The per-pixel background count rate (hereafter refered to as
) is assumed to follow a log-normal distribution across observations.
We further assume that, given these intensities, our counts are distributed as Poisson random variables. The number of counts from our actual source has a rate given by a constant (that may differ for each observation) times either the luminosity or flux, depending upon the type of analysis. The numbers of background counts in the source aperture and background region have rates given by
times different constants.
The constants mentioned above can be calculated from standard observation-specific parameters (which we take to be known with certainty). These include exposures, actual and effective areas, energy conversion factors, and aperture correction factors. If the analysis is performed using luminosity instead of flux as the fundamental quantity, the luminosity distance to each observation and its K correction are also required.
Inference is performed by generating samples from the posterior for the parameters of our model using MCMC simulation techniques [1]. By examining these samples, conclusions can be drawn about the parameters of interest (such as the mean log-luminosity or log-flux for our sample of objects). Either luminosity or flux can be analyzed as the fundamental physical parameter of interest (depending upon whether the analysistype parameter is set to 'lum' or 'flux', respectively).
Footnotes
| [1] | For more details on the actual MCMC draws, please see the detailed documentation for the mcmcstep method of bayesstack |
The full bayesstack package is available in the downloads directory as bayesstack-<version>.tar.gz.
The first step is to untar the package tarball and change into the source directory:
tar xzvf bayesstack-<version>.tar.gz
cd bayesstack-<version>
There are three methods for installing. Choose ONE of the three.
Simple:
The very simplest installation strategy is to just leave the module files in the source directory and set the PYTHONPATH environment variable to point to the source directory:
setenv PYTHONPATH $PWD
This method is fine in the short term but you always have to make sure PYTHONPATH is set appropriately (perhaps in your ~/.cshrc file). And if you start doing much with Python you will have PYTHONPATH conflicts and things will get messy.
Better:
If you cannot write into the system python library directory then do the following. These commands create a python library in your home directory and installs the bayesstack module there. You could of course choose another directory instead of $HOME as the root of your python library.
mkdir -p $HOME/lib/python
python setup.py install --home=$HOME
setenv PYTHONPATH $HOME/lib/python
Although you still have to set PYTHONPATH this method allows you to install other Python packages to the same library path. In this way you can make a local repository of packages.
Best:
If you have write access to the system python library directory you can just install there:
python setup.py install
This puts the new module straight in to the python library so it will always be available. You do NOT need to set PYTHONPATH.
For the purposes of our example, we will assume that the user has a sample of n sources that they want to analyze. bayesstack requires a structured or record array (hereafter called sourcedata) with the following columns and n rows to perform a flux-based analysis:
The following parameters must also be specified in the creation of a bayesstack object:
), where
is the per-pixel background count rate.
across observations, where
is as describe aboveFor a luminosity-based analysis, the following columns are required in the sourcedata array in addition to those given above:
Additionally, the analysistype argument must be set to 'lum' to run a luminosity-based analysis; it defaults to 'flux'.
The following are optional arguments for both types of analyses:
,
))
for inverse gamma prior on scale
parameter for distribution of
(defaults to
noninformative values)Let’s suppose that the relevant parameters for a luminosity-based stacking analysis have been placed into variables analogous to those above in the main Python namespace. To run a flux-based stacking analysis with the default values of the above parameters, we would execute the following commands:
stack_flux = bayesstack.bayesstack(sourcedata=sourcedata,
muS0=muS0, sigmaS0=sigmaS0,
muB0=muB0, sigmaB0=sigmaB0.
ndraws=ndraws,
analysistype='flux')
run_flux = stack_flux.runmcmc()
Note: It is important that ndraws and burnin are not set too low, as the Markov chain used for this simulation must be allowed to reach it’s “steady state” distribution. We have found that, for samples of reasonable size (ranging from 50 to 1500), a burnin of 10000 iterations followed by 40000 actual iterations is typically sufficient.
Following the above commands, run_flux will hold a nested dictionary with the following structure:
hyperparams: Simulated chains for parameters of flux and background intensity distributions. Contains:
- muS: Expectation of log-flux (ndraws-dim vector)
- sigmaS: Scale parameter for log-flux distribution (ndraws-dim vector)
- muB: Expectation of
(ndraws-dim vector)
- sigmaB: Standard deviation of
(ndraws-dim vector)
summaries: Summaries of posterior inferences for each source. Contains:
- logintensitymeans: Posterior mean of the log-flux for each source (n-dim vector)
- logintensitysigmas: Posterior standard deviation of log-flux for each source (n-dim vector)
- loglambdaBmeans: Posterior mean of
for each source (n-dim vector)
- loglambdaBsigmas: Posterior standard deviation of
for each source (n-dim vector)
quantiles: Chains of simulated quantiles for in-sample log-flux distribution (ndraws x len(quantiles) array)
It is also important to note that all logarithms given here are base-e, not base-10. Thus, you must convert them to the desired base after obtaining the bayesstack output.
To conduct a luminosity-based analysis, the procedure and output would be analogous. The only changes would be to ensure that the sourcedata array contains the columns distance and kcorr, and to change analysistype to 'lum':
stack_lum = bayesstack.bayesstack(sourcedata=sourcedata,
muS0=muS0, sigmaS0=sigmaS0,
muB0=muB0, sigmaB0=sigmaB0.
ndraws=ndraws,
analysistype='lum')
run_lum = stack_lum.runmcmc()
run_lum would then contain a structure analogous to that given for run_flux. The only change would be that muS, sigmaS, logintensitymeans, etc. would now contain results pertaining to log-fluxes instead of log-luminosities.