Presentations 

Alex Blocker (Harvard U) 6 Sep 2011 
 A taste of astrostatistics: problems, opportunities, & connections
 Abstract:
Astrostatistics is a vibrant, tightknit field with more open
problems than statisticians to tackle them. These range from very
applied, such as understanding the workings of space telescopes,
to fundamental questions of statistical inference, and sophisticated
computation is the order of the day. The latter is particularly
true as new instruments generate huge volumes of data.
I will provide a sampling of two projects from astrostatistics:
inferring the brightness of faint galaxies using the Chandra space
telescope, and finding unusual events within millions of astronomical
time series. These presented major inferential challenges in
radically different ways. Addressing them took a combination of
statistical modeling, scientific knowledge, and computational
finesse.
Finally, I will share some surprising connections between astrostatistics
and my work in biology. Biology appeared to lead astronomy in data
analysis for many years, but the fields are now coming full circle.
The newest forms of biological data share many features with modern
astronomical data; there is a great potential for "methodological
arbitrage" here for graduate students willing to dive into
astrostatistics.
 Slides [.pdf]


Astro Projects for Statistics 20 Sep 2011 
 Projects, problems, and demos
 Doubt: How Do I Know if that is a Real Feature in My Image? (Alanna C)
 Timing analysis of grating data (Vinay K)
 Real time feature detection and classification (Pavlos P)
 Issues in modeling the Xray data (Aneta S)
 Quasar clustering project (Brandon K)
 Simplicity: Bayesian Energy Quantiles, or Quick Nonparametric way(s) to incorporate Higher Dimensional Data (Alanna C)
 Source detection in 4D (Vinay K)
 Physics demos: poisson, atomic lines, dispersion spectra (Alanna C)


Group 1113 Oct 2011 
 Projects
 Tuesday 11 Oct
 9:30a  10:15a: pyBLoCXS (at SciCen 706)
 10:15a  11:15a: proposal
 11:30a  12:15p: new projects
 12:15p  1:00p: Bayes Factors
 2:00p  3:00p: Full Bayes Calibration Uncertainties
 3:00p  3:30p: 2D Cal Uncertainties and SCA
 Wednesday 12 Oct
 10:00a  10:30a: SolarStat (at CfA Fishbowl)
 10:30a  11:15a: Sunspot Classification
 11:15a  11:45a: Sunspot Cycles
 1:00p  2:00p: Timing analysis with grating data (at CfA M240)
 2:00p  2:45p: Solar DEM features
 2:45p  4:00p: computing
 Thursday 13 Oct
 10:00a  11:00a: pySALC (at CfA M240)
 11:00a  2:30p: proposal


Brandon Kelly (CfA/UCSB) 25 Oct 2011 
 Investigating Star Formation through Hierarchical Bayesian Modeling of Emission from Astronomical Dust
 Abstract: Astronomical dust plays an important role
in the formation of stars and planets. Recently launched observatories,
such as Herschel and Planck, are providing observations which provide
important constraints on the properties of astronomical dust.
However, the traditional leastsquares analysis used by astronomers
is highly inefficient for this problem, and leads to biases and
incorrect conclusions. In this talk I will discuss a hierarchical
Bayesian approach to deriving the physical parameters of astronomical
dust, as well as the distribution of these parameters. I will also
discuss an ancillaritysufficiency interweaving strategy for boosting
the efficiency of the MCMC sampler. Finally, I will present results
from our model as applied to a nearby starforming region. The
results obtained from our Bayesian approach lead to opposite
scientific conclusions compared to those obtained from the leastsquares
analysis. The results obtained from our Bayesian analysis are
consistent with astrophysical theories of dust formation, while the
leastsquares results are inconsistent with astrophysical theory.
 Slides: [.pdf]  [.ppt]


Raffaele D'Abrusco (CfA) 1 Nov 2011 
 Knowledge Discovery workflows for exploration of complex multiwavelengths
astronomical datasets. Application to CSC+, a sample of AGNs built on the Chandra Source Catalog
 Abstract:
A complete understanding of all astronomical sources requires a global
multiwavelength approach and that, at the same time, the availability of
large surveys of the sky in different spectral regions has propelled the
aggregation of massive and complex datasets. The traditional approach to
data analysis that involves well informed testing of different models
cannot make justice of the richness of the these new datasets and, in some
sense, of the intrinsically peculiar type of knowledge therein contained.
Knowledge Discovery (KD) techniques, while relatively new to astronomy,
have been successfully used in several other disciplines, from finance to
genomics, for the determination of complex or simple but yet unseen
patterns in large datasets.
In this talk I shall describe CLaSPS, a method for the characterization of
the multidimensional astronomical sources, based on KD unsupervised
clustering algorithms that are used to determine the spontaneous
aggregations of sources in the highdimensional space generated by their
observables. Then, a datadriven criterion is applied to pick the most
interesting clusterings in terms of astronomical properties of the sample.
I will discuss the application of this method to a sample of optically
selected AGNs with Xray observations in the Chandra Source Catalog and
other multiwavelength data, which is representative of the VOpowered
inhomogeneous astronomical dataset that will be more and more common in the
future. The goals of this project are to test known correlations, possibly
determine new patters and establish diagnostics for an improved
classification of Xray selected AGNs with multiwavelength observations.
As an example of unknown lowdimensional patters. I will also briefly
discuss a recent result on Blazars which is byproduct of the application
of CLaSPS to a sample of AGNs with multiwavelength data.
 Slides [.pdf]


Ed Turner (Princeton) 15 Nov 2011 
 A Bayesian Analysis of the Astrobiological Implications of the Rapid
Emergence of Life on the Early Earth
 Abstract:
Life arose on Earth sometime in the first few hundred million years
after the young planet had cooled to the point that it could support
waterbased organisms on its surface. The early emergence of life on
Earth has been taken as evidence that the probability of abiogenesis
is high, if starting from youngEarthlike conditions. This argument is
revisited quantitatively in a Bayesian statistical framework. Using
a simple model of the probability of abiogenesis, a Bayesian estimate of
its posterior probability is derived based on the datum that life emerged
fairly early in Earth's history and that, billions of years later,
sentient creatures noted this fact and considered its implications.
Given only this very limited empirical information, the choice of
Bayesian prior for the abiogenesis probability parameter has a very
strong influence on the computed posterior probability. In particular,
although life began on the Earth quite soon after it became habitable, that
fact is statistically consistent with an arbitrarily low intrinsic
probability of abiogenesis for plausible uninformative priors and,
therefore, with life being arbitrarily rare in the Universe. The
presentation will emphasize generic statistical properties of problems of
this general character, which occur in cosmology and many other areas
of science, as well as in the context of abiogenesis.
 Slides [.pdf]


Group 29 Nov 2011 
 20 Questions
 Wherein stats grad students ask questions of astronomers, who,
if they can't answer the question, will get to ask back a question
on statistics. Also, demos.


Xu Jin (UC Irvine) 7 Feb 2012 
 New Results of Fully Bayesian
 slides [.pdf]


Tom Loredo (Cornell) 15 Feb 2012 3:15pm  4:30pm Pratt Conference Room at CfA 
 Adaptive scheduling of exoplanet observations via Bayesian adaptive exploration
 Abstract:
I will describe ongoing work by a collaboration of astronomers and
statisticians developing a suite of Bayesian tools for analysis and adaptive
scheduling of exoplanet host star reflex motion observations. In this
presentation I will focus on the most unique aspect of our work: adaptive
scheduling of observations using the principles of Bayesian experimental
design in a sequential data analysis setting. The idea is to iterate an
observationinferencedesign cycle so as to gain information about an
exoplanet system more quickly than is possible with random or ad hoc
scheduling. I will introduce the core ideasdecision theory and
information measuresand highlight some of the computational challenges
that arise when implementing Bayesian design with nonlinear models.
Specializing to parameter estimation cases (e.g., measuring the orbit of
planet known to be present), there is an important simplification that
enables relatively straightforward calculation of greedy designs via maximum
entropy sampling. We implement MaxEnt sampling using populationbased MCMC
to provide posterior samples used in a nested Monte Carlo integration
algorithm. I will demonstrate the approach with a toy problem, and with a
reanalysis of existing exoplanet data supplemented by simulated optimal
data points.
 Presentation slides [.pdf]


Group 1617 Feb 2012 
 SolarStatistics mini Workshop
 Thursday, Feb 16 (@ Pratt)
 2:00pm  3:45pm: Stats Tutorial
 4:15pm  6:00pm: Solar Tutorial
 Friday, Feb 17 (@ Phillips)
 9:00am  10:30am: Feature Recognition
 11:00am  12:30pm: Thermal Structure
 2:00pm  3:30pm: MultiD Joint Analysis
 4:00pm  5:30pm: Massive Data Streams


Alex Blocker (Harvard) 21 Feb 2012 
 Discussion of Maximal Information Coefficient
 Abstract: The publication of Reshef et
al's work on the maximal information coefficient (MIC) in late
2011 created a great deal of buzz across many disciplines. Their
goal of identifying novel relationships in massive datasets and
lowassumption approach resonated with many researchers, and the
method's publication in Science amplified its impact substantially.
However, this work has been less warmly received by the statistical
community, where many consider it lacking compared to existing
approaches. I will summarize the theory and application of MIC as
presented by Reshef et al for scientists and statisticians, then
provide a statistical review of their approach. The broader issues
and lessons raised by this episode will also be discussed.
 Presentation slides: [.pdf]
 References and code for the talk from AB: [ab_20120221/]
 Supplement to the Reshef et al paper (especially for the statisticians),
linked from thoughtsonmicreshefetal2011


Paul Baines (UC Davis) 6 Mar 2012 [via Skype] 
 LogNLogS: Model Selection and Model Checking
 The study of astrophysical source populations is often conducted
using the cumulative distribution of the number of sources detected
at a given sensitivity. The resulting log(N>S)logS relationship
can be used to compare and evaluate theoretical models for source
populations and their evolution. In practice, however, inferring
properties of source populations from observational data is complicated
by detectorinduced uncertainties, background contamination and
missing data.
By investigating the connection between probabilistic and
theoretical assumptions in commonly used logNlogS methods, we
propose a new class of models with a more realistic physical
interpretation. Our Bayesian approach leads to efficient inference
for physical model parameters and the corrected log(N>S)log(S)
distribution for source populations. Our method extends existing
work in allowing for both nonignorable missing data and an unknown
number of unobserved sources. In this talk we will focus on model
selection issues and multivariate strategies for Bayesian model
checking.
This is joint work with Andreas Zezas, Vinay Kashyap and Irina Udaltsova.
 Presentation slides [.pdf]


Andreas Zezas (Crete) 20 Mar 2012 9am PDT / Noon EDT / 4pm GMT / 6pm EET [via Skype] 
 Adaptive Smoothing powwow
 Presentation slides [.pdf]
 The goal is to derive the ideal tool for quick astronomical analysis: a statistically principled, adaptively smoothing, fluxconserving, semiparametric tool that works in 2D, on Poisson data, and runs reasonably quickly. Some useful papers to read up on:
 ASMOOTH: A simple and efficient algorithm for adaptive kernel smoothing of twodimensional imaging data, Ebeling, H., White, D.A., & Rangarajan, F.V.N., 2006, MNRAS, 368, 65 [arXiv:0601306]
 csmooth, CIAO ahelp page, cxc/ciao/ahelp/csmooth
 Multiple Testing of Local Maxima for Detection of Unimodal Peaks in 1D, Schwartzman, A., Gavrilov, Y., & Adler, R.J., 2011 [.pdf]
 Multiple Testing of Local Maxima for Detection of Peaks in ChIPSeq Data, Schwartzman, A., Jaffe, A., Gavrilov, Y., & Meyer, C.A., 2011, HU Biostatistics Working Paper Series, 133 [.pdf]
 A WaveletBased Algorithm for the Spatial Analysis of Poisson Data, Freeman, P.E., Kashyap, V., Rosner, R., & Lamb, D.Q., 2002, ApJS, 138, 185 [.pdf]
 Low Assumptions, High Dimensions, Wasserman, L., 2011, RMM v2, 201, in Statistical Science and Philosophy of Science [.pdf]
 Multiscale Poisson Intensity and Density Estimation, Willett, R.M., and Nowak, R.D., 2007, IEEE Trans. on Inform. Theory, 53, 9 [.pdf]
 Multiscale Photonlimited Spectral Image Reconstruction, Krishnamurthy, K., Raginsky, M., and Willett, B., 2009, SIIMS [.pdf]
 Poisson Noise Reduction with NonLocal PCA, Salmon, J., Deledalle, C.A., Willett, R., and Harmany, Z., 2012, ICASSP [.pdf]


Min Shandong & Xu Jin (UCI) 03 Apr 2012 
 Bayes Factors (Shandong)
 Presentation slides [.pdf]
 Calibration (Jin)
 Presentation slides [.pdf]


Omiros Papaspiliopoulos (U Pompeu Fabra) 10 Apr 2012 
 SMC^{2}: an efficient algorithm for sequential analysis of
statespace models
 Nicolas Chopin, Pierre E. Jacob, Omiros Papaspiliopoulos

Abstract:We consider the generic problem of performing sequential Bayesian inference
in a statespace model with observation process y, state process x and
fixed parameter theta. An idealized approach would be to apply the iterated
batch importance sampling (IBIS) algorithm of Chopin (2002). This is a
sequential Monte Carlo algorithm in the thetadimension, that samples
values of theta, reweights iteratively these values using the likelihood
increments p(y_ty_1:t1, theta), and rejuvenates the thetaparticles
through a resampling step and a MCMC update step. In statespace models
these likelihood increments are intractable in most cases, but they may be
unbiasedly estimated by a particle filter in the xdimension, for any fixed
theta. This motivates the SMC^2 algorithm proposed in this article: a
sequential Monte Carlo algorithm, defined in the thetadimension, which
propagates and resamples many particle filters in the xdimension. The
filters in the xdimension are an example of the random weight particle
filter as in Fearnhead et al. (2010). On the other hand, the particle
Markov chain Monte Carlo (PMCMC) framework developed in Andrieu et al.
(2010) allows us to design appropriate MCMC rejuvenation steps. Thus, the
thetaparticles target the correct posterior distribution at each iteration
t, despite the intractability of the likelihood increments. We explore the
applicability of our algorithm in both sequential and nonsequential
applications and consider various degrees of freedom, as for example
increasing dynamically the number of xparticles. We contrast our approach
to various competing methods, both conceptually and empirically through a
detailed simulation study, included here and in a supplement, and based on
particularly challenging examples.
 paper available from arxiv.org/abs/1101.1528
 Presentation slides [.pdf]


Lazhi Wang (Harvard) 15 May 2012 
 Luminosity Functions
 Abstract: The goal of source detection is often to obtain the
luminosity function, which specifies the relative number of sources
at each luminosity for a population. In this talk, I will first
explain a hierarchical Bayesian approach to infer the distribution
of intensities (luminosities) of all the sources in a population,
given the background contaminated photon counts at the locations
of the sources. The distribution of intensities is modeled as a
zeroinflated gamma distribution. The zeroinflated component, which
is a completely new idea in astronomical problems, models the
proportion of dark sources (sources which do not emit any photons).
Then, I will display some simulation results, including the joint
posterior distributions of the parameters, the best fit of the
zeroinflated gamma and the associated uncertainty. Finally, I will
discuss different choices of priors for the hyperparameters and
the coverage percentages of the Bayesian model under different
simulation studies and with different priors.
 Presentation slides [.pdf]

Tanmoy Laskar (CfA) 29 May 2012 
 Quantifying the NonExistent  Radio, Xray and Optical Model Fitting with NonDetects
 Abstract: Nondetects are equally important in hypothesis
testing and modelfitting as "detections". While several statistical
tools have been developed in the biomedical and environmental
sciences on the incorporation of nondetects into robust analyses,
percolation of these methods into Astronomy has been slow. To bridge
this gap, I will discuss a project that involves simultaneous
modeling of multiwavelength light curves (in the context of GammaRay
Burst afterglows)  from the radio through the Xrays and seek to
understand the best statistical method for quantifying and incorporating
nondetects into the analysis.
 Presentation slides: [.pdf] ; [.odp]



