The AstroStat Slog » books

Erich Lehmann

hlee — Tue, 08 Dec 2009 04:46:34 +0000

He was one of the frequently cited statisticians in this slog because of his influence in statistics. It is extremely difficult to avoid his textbooks and his establishment of theoretical statistics when one begins to comprehend and to appreciate the modern theoretical statistics. To me, Testing Statistical Hypotheses and Theory of Point Estimation are two pillars of graduate statistical education. In addition, Elements of Large Sample Theory and Nonparametrics: Statistical Methods Based on Ranks are also eye openers.

It has not been long since I read Reminiscences of a Statistician: The Company I Kept. I quoted this book and an arXiv paper here :see the posts. I became very grateful to him because of his contributions to the statistical science. I feel so sad to see his obituary, particularly when I’m soon going to have time for reading his books more carefully.

]]>

[MADS] Kriging

hlee — Wed, 26 Aug 2009 02:19:26 +0000

Kriging is the first thing that one learns from a spatial statistics course. If an astronomer sees its definition and application, almost every astronomer will say, “Oh, I know this! It is like the 2pt correlation function!!” At least this was my first impression when I first met kriging.

There are three distinctive subjects in spatial statistics: geostatistics, lattice data analysis, and spatial point pattern analysis. Because of the resemblance between the spatial distribution of observations in coordinates and the notion of spatially random points, spatial statistics in astronomy has leaned more toward the spatial point pattern analysis than the other subjects. In other fields from immunology to forestry to geology whose data are associated spatial coordinates of underlying geometric structures or whose data were sampled from lattices, observations depend on these spatial structures and scientists enjoy various applications from geostatistics and lattice data analysis. Particularly, kriging is the fundamental notion in geostatistics whose application is found many fields.

Hitherto, I expected that the term kriging can be found rather frequently in analyzing cosmic micro-wave background (CMB) data or large extended sources, wide enough to assign some statistical models for understanding the expected geometric structure and its uncertainty (or interpolating observations via BLUP, best linear unbiased prediction). Against my anticipation, only one referred paper from ADS emerged:

Topography of the Galactic disk – Z-structure and large-scale star formation
by Alfaro, E. J., Cabrera-Cano, J., and Delgado (1991)
in ApJ, 378, pp. 106-118

I attribute this shortage of applying kriging in astronomy to missing data and differential exposure time across the sky. Both require underlying modeling to fill the gap or to convolve with observed data to compensate this unequal sky coverage. Traditionally the kriging analysis is only applied to localized geological areas where missing and unequal coverage is no concern. As many survey and probing missions describe the wide sky coverage, we always see some gaps and selection biases in telescope pointing directions. So, once this characteristics of missing is understood and incorporated into models of spatial statistics, I believe statistical methods for spatial data could reveal more information of our Galaxy and universe.

A good news for astronomers is that nowadays more statisticians and geo-scientists working on spatial data, particularly from satellites. These data are not much different compared to traditional astronomical data except the direction to which a satellite aims (inward or outward). Therefore, data of these scientists has typical properties of astronomical data: missing, unequal sky coverage or exposure and sparse but gigantic images. Due to the increment of computational power and the developments in hierarchical modeling, techniques in geostatistics are being developed to handle these massive, but sparse images for statistical inference. Not only denoising images but they also aim to produce a measure of uncertainty associated with complex spatial data.

For those who are interested in what spatial statistics does, there are a few books I’d like to recommend.

Cressie, N (1993) Statistics for spatial data
(the bible of statistical statistics)
Stein, M.L. (2002) Interpolation of Spatial Data: Some Theory for Kriging
(it’s about Kriging and written by one of scholarly pinnacles in spatial statistics)
Banerjee, Carlin, and Gelfand (2004) Hierarchical Modeling and Analysis for Spatial Data
(Bayesian hierarchical modeling is explained. Very pragmatic but could give an impression that it’s somewhat limited for applications in astronomy)
Illian et al (2008) Statistical Analysis and Modelling of Spatial Point Patterns
(Well, I still think spatial point pattern analysis is more dominant in astronomy than geostatistics. So… I feel obliged to throw a book for that. If so, I must mention Peter Diggle’s books too.)
Diggle (2004) Statistical Analysis of Spatial Point Patterns
Diggle and Ribeiro (2007) Model-based Geostatistics

]]>

Books – a boring title

hlee — Fri, 25 Jan 2008 16:53:21 +0000

I have been observing some sorts of misconception about statistics and statistical nomenclature evolution in astronomy, which I believe, are attributed to the lack of references in the astronomical society. There are some textbooks designed for junior/senior science and engineering students, which are likely unknown to astronomers. Example-wise, these books are not suitable, to my knowledge. Although I never expect astronomers to learn standard graduate (mathematical) statistics textbooks, I do wish astronomers go beyond Numerical Recipes (W. H. Press, S. A. Teukolsky, W. T. Vetterling, & B. P. Flannery) and Error Data Reduction and Analysis for the Physical Sciences (P. R. Bevington & D. K. Robinson). Here are some good ones written by astronomers, engineers, and statisticians:

The motivation of writing this posting was originated to Vinay’s recommendation: Practical Statistics for Astronomers (J.V.Wall and C.R.Jenkins), which provides many statistical insights and caveats that astronomers tend to ignore. Without looking at the error distribution and the properties of data, astronomers jump into chi-square and correlation. If someone reads the book, he/she will be careful on adopting statistics of common practice in astronomy, developed many decades ago, and founded on strong assumptions, not compatible with modern data sets. The book addresses many concerns that have been growing in my mind for astronomers and introduces various statistical methods applicable in astronomy.

The view points of astronomers without in-class statistics education but with full readership of this book, would be different from mine. The book mentioned unbiasedness, consistency, closedness, and robustness of statistics, which normally are not discussed nor proved in astronomy papers. Therefore, those readers may miss the insights, caveats, and contents-between-the-lines of the book, which I care about. To reduce such gap, as for quick and easy understanding of classical statistics, I recommend Cartoon Guide to Statistics (Larry Gonick, Woollcott Smith Business & Investing Collins) as a first step. This cartoon book enhances fundamentals in statistics only with fun and a friendly manner, and provides everything that rudimentary textbooks offer.

If someone wants to know beyond classical statistics (so called frequentist statistics) and likes to know popular Bayesian statistics, astronomy professor Phil Gregory’s Bayesian Logical Data Analysis for the Physical Sciences is recommended. If one likes to know little bit more on the modern statistics of frequentists and Bayesians, All of Statistics (Larry Wasserman) is recommended. I realize that textbooks for non-statistics students are too thick to go through in a short time (The book for senior engineering students at Penn State I used for teaching was Probability and Statistics for Engineering and the Sciences by Jay. L Devore, 4th and 5th edition and it was about 600 pages. The current edition is 736 pages). One of well received textbooks for graduate students in electrical engineering is Probability, Random Variables and Stochastic Processes (A. Papoulis & S.U. Pillai). I remember the book offers a rather less abstract definition of measure and practical examples (Personally, Hermite polynomials was useful from the book).

For a casual reading about statistics and its 20th century history, The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century (D. Salsburg) is quite nice.

Statistics is not just for best fit analysis and error bars. It is a wonderful telescope extracts correct information when it is operated carefully to the right target by the manual. It gets rid of atmospheric and other blurring factors when statistics is understood righteously. It is not a black box nor a magic, as many people think.

The era of treating everything gaussian is over decades ago. Because of the central limit theorem and the delta method (a good example is log-transformation), many statistics asymptotically follows the normal (gaussian) distribution but there are various families of distributions. Because of possible bias in the chi-square method, the error bar cannot guarantee the appointed coverage, like 95%. There are also nonparametric statistics, known for robustness, whereas it may be less efficient than statistics of distribution family assumption. Yet, it does not require model assumption. Also, Bayesian statistics works wonderfully if correct information on priors, suitable likelihood models, and computing powers for hierarchical models and numerical integration are provided.

Before jumping into the chi-square for fitting and testing at the same time, to prevent introducing bias, exploratory data analysis is required for better understanding data and for seeking a suitable statistic and its assumptions. The exploratory data analysis starts from simple scatter plots and box plots. A little statistical care for data and good interests in the truth of statistical methods are all I am asking for. I do wish that these books could assist the realization of my wishes.

—————————————————————————-
[1.] Most of links to books are from amazon.com but there is no personal affiliation to the company.

[2.] In addition to the previous posting on chi-square, what is so special about chi square in astronomy, I’d like to mention possible bias in chi-square fitting and testing. It is well known that utilizing the same data set for fitting, which results in parameter estimates so called in astronomy best fit values and error bars, and testing based on these parameter estimates brings out bias so that the best fit is biased from the true parameter value and the error bar does not match the aimed coverage. See the problem from Aneta’s an example of chi2 bias in fitting x-ray spectra

[3.] More book recommendation is welcome.

]]>