The AstroStat Slog » Estimation

[ArXiv] 2nd week, Mar. 2008

hlee — Fri, 14 Mar 2008 19:44:34 +0000

Warning! The list is long this week but diverse. Some are of CHASC’s obvious interest.

[astro-ph:0803.0997] V. Smolcic et.al.
A new method to separate star forming from AGN galaxies at intermediate redshift: The submillijansky radio population in the VLA-COSMOS survey
[astro-ph:0803.1048] T.A. Carroll and M. Kopf
Zeeman-Tomography of the Solar Photosphere — 3-Dimensional Surface Structures Retrieved from Hinode Observations
[astro-ph:0803.1066] M. Beasley et.al.
A 2dF spectroscopic study of globular clusters in NGC 5128: Probing the formation history of the nearest giant Elliptical
[astro-ph:0803.1098] Z. Lorenzo
A new luminosity function for galaxies as given by the mass-luminosity relationship
[astro-ph:0803.1199] D. Coe et.al.
LensPerfect: Gravitational Lens Massmap Reconstructions Yielding Exact Reproduction of All Multiple Images (could it be related to GREAT08 Challenge?)
[astro-ph:0803.1213] H.Y.Wang et.al.
Reconstructing the cosmic density field with the distribution of dark matter halos
[astro-ph:0803.1420] E. Lantz et.al.
Multi-imaging and Bayesian estimation for photon counting with EMCCD’s
[astro-ph:0803.1491] Wu, Rozo, & Wechsler
The Effect of Halo Assembly Bias on Self Calibration in Galaxy Cluster Surveys
[astro-ph:0803.1616] P. Mukherjee et.al.
Planck priors for dark energy surveys (some CHASCians would like to check!)
[astro-ph:0803.1738] P. Mukherjee and A. R. Liddle
Planck and reionization history: a model selection view
[astro-ph:0803.1814] J. Cardoso et.al.
Component separation with flexible models. Application to the separation of astrophysical emissions
[astro-ph:0803.1851] A. R. Marble et.al.
The Flux Auto- and Cross-Correlation of the Lyman-alpha Forest. I. Spectroscopy of QSO Pairs with Arcminute Separations and Similar Redshifts
[astro-ph:0803.1857] R. Marble et.al.
The Flux Auto- and Cross-Correlation of the Lyman-alpha Forest. II. Modelling Anisotropies with Cosmological Hydrodynamic Simulations

[ArXiv] 4th week, Jan. 2008

hlee — Fri, 25 Jan 2008 16:37:12 +0000

Only three papers this week. There were a few more with chi-square fitting and its error bars but excluded.

[astro-ph:0801.3346] Hipparcos distances of Ophiuchus and Lupus cloud complexes M. Lombardi, C. Lada, & J. Alves (likelihoods and MCMC were used)
[astro-ph:0801.3543] Results of the ROTOR-program. II. The long-term photometric variability of weak-line T Tauri stars K.N. Grankin et. al. (discusses periodogram)
[astro-ph:0801.3822] Estimating the Redshift Distribution of Faint Galaxy Samples M. Lima et.al.

[ArXiv] Post Model Selection, Nov. 7, 2007

hlee — Wed, 07 Nov 2007 15:57:01 +0000

Today’s arxiv-stat email included papers by Poetscher and Leeb, who have been working on post model selection inference. Sometimes model selection is misled as a part of statistical inference. Simply, model selection can be considered as a step prior to inference. How you know your data are from chi-square distribution, or gamma distribution? (this is a model selection problem with nested models.) Should I estimate the degree of freedom, k from Chi-sq or α and β from gamma to know mean and error? Will the errors of the mean be same from both distributions?

Prior to estimating means and errors of parameters, one wishes to choose a model where parameters of interests are properly embedded. The arising problem is one uses the same data to choose a model (e.g. choosing the model with the largest likelihood value or bayes factor) as well as to perform statistical inference (estimating parameters, calculating confidence intervals and testing hypotheses), which inevitably introduces bias. Such bias has been neglected in general (a priori tells what model to choose: e.g. the 2nd order polynomial is the absolute truth and the residuals are realizations of the error term, by the way how one can sure that the error follows normal distribution?). Asymptotics enables this bias to be O(n^m), where m is smaller than zero. Estimating this bias has been popular since Akaike introduced AIC (one of the most well known model selection criteria). Numerous works are found in the field of robust penalized likelihood. Variable selection has been a very hot topic in a recent few decades. Beyond my knowledge, there were more approaches to cope with this bias not to contaminate the inference results.

The works by Professors Poetscher and Leeb looked unique to me in the line of resolving the intrinsic bias arise from inference after model selection. In stead of being listed in my weekly arxiv lists, their arxiv papers deserved to be listed under a separate posting. I also included some more general references.

The list of paper from today’s arxiv:

[stat.TH:0702703] Can one estimate the conditional distribution of post-model-selection estimators? by H. Leeb and B. M. P\”{o}tscher
[stat.TH:0702781] The distribution of model averaging estimators and an impossibility result regarding its estimation by B. M. P\”{o}tscher
[stat.TH:0704.1466] Sparse Estimators and the Oracle Property, or the Return of Hodges’ Estimator by H. Leeb and B. M. Poetscher
[stat.TH:0711.0660] On the Distribution of Penalized Maximum Likelihood Estimators: The LASSO, SCAD, and Thresholding by B. M. Poetscher, and H. Leeb
[stat.TH:0701781] Learning Trigonometric Polynomials from Random Samples and Exponential Inequalities for Eigenvalues of Random Matrices by K. Groechenig, B.M. Poetscher, and H. Rauhut

Other resources:

Prof. Leeb’s website has other published papers
Effects of Model Selection on Inference B.M.Potscher, Econometric Theory, Vol. 7, No. 2 (Jun., 1991), pp. 163-185
The Effect of Model Selection on Confidence Regions and Prediction Regions P.Kabaila, Econometric Theory, Vol. 11, No. 3 (Aug., 1995), pp. 537-549
Model Selection and Multi-Model Inference: a book by Burnham and Anderson
modelselection.org: it’s a model selection website but looks like pageant show website.

[Added on Nov.8th] There were a few more relevant papers from arxiv.

[stat.AP:0711.0993] Upper bounds on the minimum coverage probability of confidence intervals in regression after variable selection by P. Kabaila and K. Giri
[stat.ME:0710.1036] Confidence Sets Based on Sparse Estimators Are Necessarily Large by B. M. Pötscher

Implement Bayesian inference using PHP

hlee — Fri, 05 Oct 2007 20:47:39 +0000

Not knowing much about java and java applets in a software development and its web/internet publicizing, I cannot comment what is more efficient. Nevertheless, I thought that PHP would do the similar job in a simpler fashion and the followings may provide some ideas and solutions for publicizing statistical methods through websites based on Bayesian Inference.

These three websites are found from developerWorks at ibm and all information available there seems open source. The websites explain solving particular problems with PHP based on Bayesian Inference.

Implement Bayesian inference using PHP, part 1: Build intelligent Web applications through conditional probability
Implement Bayesian inference using PHP, part 2: Solving parameter estimation problems
Implement Bayesian inference using PHP, part 3: Solving classification problems

[ArXiv] Spectroscopic Survey, June 29, 2007

hlee — Mon, 02 Jul 2007 22:07:39 +0000

From arXiv/astro-ph:0706.4484

Spectroscopic Surveys: Present by Yip. C. overviews recent spectroscopic sky surveys and spectral analysis techniques toward Virtual Observatories (VO). In addition that spectroscopic redshift measures increase like Moore’s law, the surveys tend to go deeper and aim completeness. Mainly elliptical galaxy formation has been studied due to more abundance compared to spirals and the galactic bimodality in color-color or color-magnitude diagrams is the result of the gas-rich mergers by blue mergers forming the red sequence. Principal component analysis has incorporated ratios of emission line-strengths for classifying Type-II AGN and star forming galaxies. Lyα identifies high z quasars and other spectral patterns over z reveal the history of the early universe and the characteristics of quasars. Also, the recent discovery of 10 satellites to the Milky Way is mentioned.

Spectral analyses take two approaches: one is the model based approach taking theoretical templates, known for its flaws but straightforward extractions of physical parameters, and the other is the empirical approach, useful for making discoveries but difficult in the analysis interpretation. Neither of them has substantial advantage to the other. When it comes to fitting, Chi-square minimization has been dominant but new methodologies are under developing. For spectral classification problems, principal component analysis (Karlhunen-Loeve transformation), artificial neural network, and other machine learning techniques have been applied.

In the end, the author reports statistical and astrophysical challenges in massive spectroscopic data of present days: 1. modeling galaxies, 2. parameterizing star formation history, 3. modeling quasars, 4. multi-catalog based calibration (separating systematic and statistics errors), 5. estimating parameters, which would be beneficial to VO, of which objective is the unification of data access.