As a part of introducing nonparametric statistics, I wanted to write about applications of computation geometry from the nonparametric 2/3 dimensional density estimation perspective. Also, the following article came along when I just began to collect statistical applications in astronomy (my [ArXiv] series). This [arXiv] paper, in fact, initiated me to investigate Voronoi Tessellations in astronomy in general.
[arxiv/astro-ph:0707.2877]
Voronoi Tessellations and the Cosmic Web: Spatial Patterns and Clustering across the Universe
by Rien van de Weygaert
Since then, quite time has passed. In the mean time, I found more publications in astronomy specifically using tessellation as a main tool of nonparametric density estimation and for data analysis. Nonetheless, in general, topics in spatial statistics tend to be unrecognized or almost ignored in analyzing astronomical spatial data (I mean data points with coordinate information). Many seem only utilizing statistics partially or not at all. Some might want to know how often Voronoi tessellation is applied in astronomy. Here, I listed results from my ADS search by limiting tessellation in title key words. :
Then, the topic has been forgotten for a while until this recent [arXiv] paper, which reminded me my old intention for introducing tessellation for density estimation and for understanding large scale structures or clusters (astronomers’ jargon, not the term in machine or statistical learning).
[arxiv:stat.ME:0910.1473] Moment Analysis of the Delaunay Tessellation Field Estimator
by M.N.M van Lieshout
Looking into plots of the papers by van de Weygaert or van Lieshout, without mathematical jargon and abstraction, one can immediately understand what Voronoi and Delaunay Tessellation is (Delaunay Tessellation is also called as Delaunay Triangulation (wiki). Perhaps, you want to check out wiki:Delaunay Tessellation Field Estimator as well). Voronoi tessellations have been adopted in many scientific/engineering fields to describe the spatial distribution. Astronomy is not an exception. Voronoi Tessellation has been used for field interpolation.
van de Weygaert described Voronoi tessellations as follows:
van Lieshout derived explicit expressions for the mean and variance of Delaunay Tessellatoin Field Estimator (DTFE) and showed that for stationary Poisson processes, the DTFE is asymptotically unbiased with a variance that is proportional to the square intensity.
We’ve observed voids and filaments of cosmic matters with patterns of which theory hasn’t been discovered. In general, those patterns are manifested via observed galaxies, both directly and indirectly. Individual observed objects, I believe, can be matched to points that construct Voronoi polygons. They represent each polygon and investigating its distributional properly helps to understand the formation rules and theories of those patterns. For that matter, probably, various topics in stochastic geometry, not just Voronoi tessellation, can be adopted.
There are plethora information available on Voronoi Tessellation such as the website of International Symposium on Voronoi Diagrams in Science and Engineering. Two recent meeting websites are ISVD09 and ISVD08. Also, the following review paper is interesting.
Centroidal Voronoi Tessellations: Applications and Algorithms (1999) Du, Faber, and Gunzburger in SIAM Review, vol. 41(4), pp. 637-676
By the way, you may have noticed my preference for Voronoi Tessellation over Delaunay owing to the characteristics of this centroidal Voronoi that each observation is the center of each Voronoi cell as opposed to the property of Delaunay triangulation that multiple simplices are associated one observation/point. However, from the perspective of understanding the distribution of observations as a whole, both approaches offer summaries and insights in a nonparametric fashion, which I put the most value on.
]]>When I learned “Kalman filter” for the first time, I was not sure how to distinguish it from “Yule-Walker equation” (time series), “Pade approximant, (unfortunately, the wiki page does not have its matrix form). Wiener Filter” (signal processing), etc. Here are those publications, specifically mentioned the name Kalman filter in their abstracts found from ADS.
The motivation of introducing Kalman filter although it is a very well known term is the recent Fisher Lecture given by Noel Cressie at the JSM 2009. He is the leading expert in spatial statistics. He is the author of a very famous book in Spatial Statistics. During his presentation, he described challenges from satellite data and how Kalman filter accelerated computing a gigantic covariance matrix in kriging. Satellite data of meteorology and geosciences may not exactly match with astronomical satellite data but from statistical modeling perspective, the challenges are similar. Namely, massive data, streaming data, multi dimensional, temporal, missing observations in certain areas, different exposure time, estimation and prediction, interpolation and extrapoloation, large image size, and so on. It’s not just focusing denoising/cleaning images. Statisticians want to find the driving force of certain features by modeling and to perform statistical inference. (They do not mind parametrization of interesting metric/measure/quantity for modeling or they approach the problem in a nonparametric fashion). I understood the use of Kalman filter for a fast solution to inverse problems for inference.
]]>There are three distinctive subjects in spatial statistics: geostatistics, lattice data analysis, and spatial point pattern analysis. Because of the resemblance between the spatial distribution of observations in coordinates and the notion of spatially random points, spatial statistics in astronomy has leaned more toward the spatial point pattern analysis than the other subjects. In other fields from immunology to forestry to geology whose data are associated spatial coordinates of underlying geometric structures or whose data were sampled from lattices, observations depend on these spatial structures and scientists enjoy various applications from geostatistics and lattice data analysis. Particularly, kriging is the fundamental notion in geostatistics whose application is found many fields.
Hitherto, I expected that the term kriging can be found rather frequently in analyzing cosmic micro-wave background (CMB) data or large extended sources, wide enough to assign some statistical models for understanding the expected geometric structure and its uncertainty (or interpolating observations via BLUP, best linear unbiased prediction). Against my anticipation, only one referred paper from ADS emerged:
Topography of the Galactic disk – Z-structure and large-scale star formation
by Alfaro, E. J., Cabrera-Cano, J., and Delgado (1991)
in ApJ, 378, pp. 106-118
I attribute this shortage of applying kriging in astronomy to missing data and differential exposure time across the sky. Both require underlying modeling to fill the gap or to convolve with observed data to compensate this unequal sky coverage. Traditionally the kriging analysis is only applied to localized geological areas where missing and unequal coverage is no concern. As many survey and probing missions describe the wide sky coverage, we always see some gaps and selection biases in telescope pointing directions. So, once this characteristics of missing is understood and incorporated into models of spatial statistics, I believe statistical methods for spatial data could reveal more information of our Galaxy and universe.
A good news for astronomers is that nowadays more statisticians and geo-scientists working on spatial data, particularly from satellites. These data are not much different compared to traditional astronomical data except the direction to which a satellite aims (inward or outward). Therefore, data of these scientists has typical properties of astronomical data: missing, unequal sky coverage or exposure and sparse but gigantic images. Due to the increment of computational power and the developments in hierarchical modeling, techniques in geostatistics are being developed to handle these massive, but sparse images for statistical inference. Not only denoising images but they also aim to produce a measure of uncertainty associated with complex spatial data.
For those who are interested in what spatial statistics does, there are a few books I’d like to recommend.
Personal lessons from two short discussions at the AAS were more collaboration between statisticians and astronomers to include measurement errors in classification or semi-supervised learning particularly for nowadays when we are enjoying plethora of data sets and moving forward with a better aid from statisticians for testing/verifying the existence of clusters beyond fitting a straight line.
]]>[astro-ph:0804.3044] J.M. Loh
Estimating Third-Order Moments for an Absorber Catalog
Instead of getting to the detailed contents, which is left to the readers, I’d rather cite a few key points without math symbols.The script K is denoted as the 3rd order K-function from which the three-point and reduced three-point correlation functions are derived. The benefits of using the script K function over these correlation functions are given regarding bin size and edge correction. Yet, the author did not encourage to use the script K function only but to use all tools. Also, the feasibility of computing third or higher order measures of clustering is mentioned due to larger datasets and advances in computing. In appendix, the unbiasedness of the estimator regarding the script K is proved.
The reason for bringing in this K-function comes from my early experience in learning statistics. My memory of learning the 2 point correlation function from an undergraduate cosmology class is very vague but the basic idea of modeling this function gave me an epiphany during a spatial statistics class several years ago when the Ripley’s K-function was introduced. I vividly remember that I set up my own project to use this K-function to get the characteristics of the spatial distribution of GRBs. The particular reason for selecting GRBs instead of galaxies was 1. I was able to find the data set from the internet on my own (BATSE catalog: astronomers may think accessing data archives is easy but generally statistics students were not exposed to the fact that astronomical data sets are available via internet and in terms of data sets, they depend heavily on data providers, or clients), and 2. I recalled a paper by Professors Efron and Petrosian (1995, ApJ, 449:215-223 Testing Isotropy versus Clustering of Gamma-ray Bursts, who utilized the nearest neighborhood approach. After a few weeks, I made another discovery that people found GRB redshifts and began to understand the cosmological origin of GRBs more deeply. In other words, 2D spatial statistics was not the way to find the origins of GRBs. Due to a few shortcomings, one of them was the latitude dependent observation of BATSE (as a second year graduate student, I didn’t confront the idea of censoring and truncation, yet), I discontinued my personal project with a discouragement that I cannot make any contribution (data themselves, like discovering the distances, speak more louder than statistical inferences without distances).
I was delighted to see the work by Prof. Loh about the Ripley’s K function. Those curious about the K function may check the book written by Martinez and Saar, Statistics of the Galaxy Distribution (Amazon Link). Many statistical publications are also available under spatial statistics and point process that includes the Ripley’s K function.
]]>