#### [ArXiv] Correlation Studies, June 12, 2007

One of arxiv/astro-ph preprints, arxiv/0706.1703v1 discusses **correlation** between galactic HI and the cosmic microwave background (CMB) and reports no statistically significant correlation.

Beyond the astrophysical significance of the paper, when **correlation** appears in scientific papers, people expect that the papers are about **statistics.** Are these correlation studies truly statistical science?

Statistical Challenges in Modern Astronomy III (2001) was the first conference I confronted astronomy since my subject of interest had changed from solar physics to statistics, of which field I only have a very rudimentary level of knowledge at that time. Although I was a mere helper for the conference, I managed to eavesdrop some talks and discussions from conference participants and the word **correlation** was frequently captured.

Consider a set of paired points uniformly distributed on a circle in 2D euclidean space. The estimated correlation is close to zero but we understand this data set is highly correlated. Depending on the definition of correlation associated with data space, the degree of correlation could show significantly different measures. Therefore, I have been doubting what is so important about **correlation** in astronomy.

After some years, I realized that **correlation** is important in astronomy, astrophysics, and cosmology as in arxiv/0706.1703v1 and other papers due to the fact that the estimated correlation coefficient may tell physical correlation among objects of interest. The correlation is treated as a blinded statistical tool that directly tells the physical correlation. I have some impression that astronomers believe important physical correlation comes from a statistically significant correlation coefficient without investigating the foundation of statistical inference.

On the other hand, the nice part of arxiv/0706.1703v1 is authors’ two caveats on correlation: 1. inevitable appearance of correlation due to random fluctuation, therefore not to use *a-posteriori* statistics and 2. misleading visual correlation, therefore, quantitative methods are required, like Monte Carlo methods for assessing significance.

I hope that some astronomers provide a good description of what makes estimating correlation so important and how statistically significant correlation becomes physically important correlation.

p.s. In the paper,

If one draws N numbers between 0 and 1, the probability that they will all be smaller than x is p=1-x^N.

I think this should be p=x^N.

## vlk:

Re: “astronomers believe important physical correlation comes from a statistically significant correlation coefficient without investigating the foundation of statistical inference”

I think you have this backwards. Important physical correlation is

06-18-2007, 7:37 pmsupportedby statistically significantevidence. The mere existence of a correlation is not taken as evidence of causation.## hlee:

After attending Clay lecture and talks at CfA, I got an impression that correlation studies from astronomy are confined to statistics of linear regression and, I believe, can be modified/improved depending on the types of correlation (e.g. the looks of scatter plots, covariates, and outliers). Compiling more data has been the target to illustrate better correlations (evidence of correlations) but exploring statistical methods to estimate correlations (even we can include the uncertain nature of measurements both on x and y axes and we can assume different distribution family beyond normal) could reveal different aspects of (physical) correlations.

11-01-2007, 6:32 pm