The AstroStat Slog » Misc http://hea-www.harvard.edu/AstroStat/slog Weaving together Astronomy+Statistics+Computer Science+Engineering+Intrumentation, far beyond the growing borders Fri, 09 Sep 2011 17:05:33 +0000 en-US hourly 1 http://wordpress.org/?v=3.4 The Perseid Project [Announcement] http://hea-www.harvard.edu/AstroStat/slog/2010/perseid-project/ http://hea-www.harvard.edu/AstroStat/slog/2010/perseid-project/#comments Mon, 02 Aug 2010 21:21:35 +0000 vlk http://hea-www.harvard.edu/AstroStat/slog/?p=4255 There is an ambitious project afoot to build a 3D map of a meteor stream during the Perseids on Aug 11-12. I got this missive about it from the organizer, Chris Crawford:

This will be one of the better years for Perseids; the moon, which often interferes with the Perseids, will not be a problem this year. So I’m putting together something that’s never been done before: a spatial analysis of the Perseid meteor stream. We’ve had plenty of temporal analyses, but nobody has ever been able to get data over a wide area — because observations have always been localized to single observers. But what if we had hundreds or thousands of people all over North America and Europe observing Perseids and somebody collected and collated all their observations? This is crowd-sourcing applied to meteor astronomy. I’ve been working for some time on putting together just such a scheme. I’ve got a cute little Java applet that you can use on your laptop to record the times of fall of meteors you see, the spherical trig for analyzing the geometry (oh my aching head!) and a statistical scheme that I *think* will reveal the spatial patterns we’re most likely to see — IF such patterns exist. I’ve also got some web pages describing the whole shebang. They start here:

http://www.erasmatazz.com/page78/page128/PerseidProject/PerseidProject.html

I think I’ve gotten all the technical, scientific, and mathematical problems solved, but there remains the big one: publicizing it. It won’t work unless I get hundreds of observers. That’s where you come in. I’m asking two things of you:

1. Any advice, criticism, or commentary on the project as presented in the web pages.
2. Publicizing it. If we can get that ol’ Web Magic going, we could get thousands of observers and end up with something truly remarkable. So, would you be willing to blog about this project on your blog?
3. I would be especially interested in your comments on the statistical technique I propose to use in analyzing the data. It is sketched out on the website here:

http://www.erasmatazz.com/page78/page128/PerseidProject/Statistics/Statistics.html

Given my primitive understanding of statistical analysis, I expect that your comments will be devastating, but if you’re willing to take the time to write them up, I’m certainly willing to grit my teeth and try hard to understand and implement them.

Thanks for any help you can find time to offer.

Chris Crawford

]]>
http://hea-www.harvard.edu/AstroStat/slog/2010/perseid-project/feed/ 0
General comments http://hea-www.harvard.edu/AstroStat/slog/2010/general-comments/ http://hea-www.harvard.edu/AstroStat/slog/2010/general-comments/#comments Wed, 16 Jun 2010 16:15:53 +0000 chasc http://hea-www.harvard.edu/AstroStat/slog/?p=4219 For general comments.

]]>
http://hea-www.harvard.edu/AstroStat/slog/2010/general-comments/feed/ 0
Do people use Fortran? http://hea-www.harvard.edu/AstroStat/slog/2009/do-people-use-fortran/ http://hea-www.harvard.edu/AstroStat/slog/2009/do-people-use-fortran/#comments Tue, 27 Oct 2009 03:41:42 +0000 hlee http://hea-www.harvard.edu/AstroStat/slog/?p=3915 I’m very sure that Fortran is one of the major scientific programming languages. Many functions, modules, and libraries are written in this language. Without being aware of, these routines are ported into many script languages. However, I become curious whether Fortran is still the major force in astronomy or statistics, compared to say 20 years ago (10 seems too small).

I recently placed my Numerical Recipes in Fortran in someone’s hands because I can access the electronic version of NR in C/C++. I have some manuals about Fortran 77 and 90/95, and IMSL in Fortran but I haven’t put my hands on them in recent years. I now feel that these manuals are on the verge of recycling bins or deletion. But the question about the trend in scientific computation languages pulls my sleeve to think over. With a bit of shyness, I want to ask scientists with long experience in both fields for their opinions about Fortran. Do any experienced scientists ask their students or post-docs to acquire knowledge in Fortran? While young people pursuing Python, R, and other scripting languages thanks to GNU GPL (There are a few caveats in this transition, but I’ll discuss that later).

]]>
http://hea-www.harvard.edu/AstroStat/slog/2009/do-people-use-fortran/feed/ 2
The chance that A has nukes is p% http://hea-www.harvard.edu/AstroStat/slog/2009/the-chance-that-a-has-nukes-is-p/ http://hea-www.harvard.edu/AstroStat/slog/2009/the-chance-that-a-has-nukes-is-p/#comments Fri, 23 Oct 2009 17:26:07 +0000 hlee http://hea-www.harvard.edu/AstroStat/slog/?p=3897 I watched a movie in which one of the characters said, “country A has nukes with 80% chance” (perhaps, not 80% but it was a high percentage). One of the statements in that episode is that people will not eat lettuce only if the 1% chance of e coli is reported, even lower. Therefore, with such a high percentage of having nukes, it is right to send troops to A. This episode immediately brought me a thought about astronomers’ null hypothesis probability and their ways of concluding chi-square goodness of fit tests, likelihood ratio tests, or F-tests.

First of all, I’d like to ask how you would like to estimate the chance of having nukes in a country? What this 80% implies here? But, before getting to the question, I’d like to discuss computing the chance of e coli infection, first.

From the frequentists perspective, computing the chance of e coli infection is investigating sample of lettuce and counts species that are infected: n is the number of infected species and N is the total sample size. 1% means one among 100. Such percentage reports and their uncertainties are very familiar scene during any election periods for everyone. From Bayesian perspective, Pr(p|D) ~ L(D|p) pi(p), properly choosing likelihoods and priors, one can estimate the chance of e coli infection and uncertainty. Understanding of sample species and a prior knowledge helps to determine likelihoods and priors.

How about the chance that country A has nukes? Do we have replicates of country A so that a committee investigate each country and count ones with nukes to compute the chance? We cannot do that. Traditional frequentist approach, based on counting, does not work here to compute the chance. Either using fiducial likelihood approach or Bayesian approach, i.e. carefully choosing a likelihood function adequately (priors are only for Bayesian) allows one to compuate such probability of interest. In other words, those computed chances highly depend on the choice of model and are very subjective.

So, here’s my concern. It seems like that astronomers want to know the chance of their spectral data being described by a model (A*B+C)*D (each letter stands for one of models such as listed in Sherpa Models). This is more like computing the chance of having nukes in country A, not counting frequencies of the event occurrence. On the other hand, p-value from goodness of fit tests, LRTs, or F-tests is a number from the traditional frequentists’ counting approach. In other words, p-value accounts for, under the null hypothesis (the (A*B+C)*D model is the right choice so that residuals are Gaussian), how many times one will observe the event (say, reduced chi^2 >1.2) if the experiments are done N times. The problem is that we only have one time experiment and that one spectrum to verify the (A*B+C)*D is true. Goodness of fit or LRT only tells the goodness or the badness of the model, not the statistically and objectively quantified chance.

In order to know the chance of the model (A*B+C)*D, like A has nuke with p%, one should not rely on p-values. If you have multiple models, one could compute pairwise relative chances i.e. odds ratios, or Bayes factors. However this does not provide the uncertainty of the chance (astronomers have the tendency of reporting uncertainties of any point estimates even if the procedure is statistically meaningless and that quantified uncertainty is not statistical uncertainty, as in using delta chi^2=1 to report 68% confidence intervals). There are various model selection criteria that cater various conditions embedded in data to make a right model choice among other candidate models. In addition, post-inference for astronomical models is yet a very difficult problem.

In order to report the righteous chance of (A*B+C)*D requires more elaborated statistical modeling, always brings some fierce discussions between frequentists and Bayesian because of priors and likelihoods. Although it can be very boring process, I want astronomers to leave the problem to statisticians instead of using inappropriate test statistics and making creative interpretation of statistics.

Please, keep this question in your mind when you report probability: what kind of chance are you computing? The chance of e coli infection? Or the chance that A has nukes? Make sure to understand that p-values from data analysis packages does not tell you that the chance the model (A*B+C)*D is (one minus p-value)%. You don’t want to report one minus p-value from a chi-square test statistic as the chance that A has nukes.

]]>
http://hea-www.harvard.edu/AstroStat/slog/2009/the-chance-that-a-has-nukes-is-p/feed/ 0
data analysis system and its documentation http://hea-www.harvard.edu/AstroStat/slog/2009/data-analysis-system-and-its-documentation/ http://hea-www.harvard.edu/AstroStat/slog/2009/data-analysis-system-and-its-documentation/#comments Fri, 02 Oct 2009 02:11:04 +0000 hlee http://hea-www.harvard.edu/AstroStat/slog/?p=1977 So far, I didn’t complain much related to my “statistician learning astronomy” experience. Instead, I’ve been trying to emphasize how fascinating it is. I hope that more statisticians can join this adventure when statisticians’ insights are on demand more than ever. However, this positivity seems not working so far. In two years of this slog’s life, there’s no posting by a statistician, except one about BEHR. Statisticians are busy and well distracted by other fields with more tangible data sets. Or compared to other fields, too many obstacles and too high barriers exist in astronomy for statisticians to participate. I’d like to talk about these challenges from my ends.[1]

The biggest challenge for a statistician to use astronomical data is the lack of mercy for nonspecialists’ accessing data including format, quantification, and qualification[2] ; and data analysis systems. IDL is costly although it is used in many disciplines and other tools in astronomy are hardly utilized for different projects.[3] In that regards, I welcome astronomers using python to break such exclusiveness in astronomical data analysis systems.

Even if data and software issues are resolved, there’s another barrier to climb. Validation. If you have a catalog, you’ll see variables of measures, and their errors typically reflecting the size of PSF and its convolution to those metrics. If a model of gaussian assumption applied, in order to tabulate power law index, King’s, Petrosian’s, or de Vaucouleurs’ profile index, and numerous metrics, I often fail to find any validation of gaussian assumptions, gaussian residuals, spectral and profile models, outliers, and optimal binning. Even if a data set is publicly available, I also fail to find how to read in raw data, what factors must be considered, and what can be discarded because of unexpected contamination occurred like cosmic rays and charge over flows. How would I validate raw data that are read into a data analysis system is correctly processed to match values in catalogs? How would I know all entries in catalog are ready for further scientific data analysis? Are those sources real? Is p-value appropriately computed?

I posted an article about Chernoff faces applied to Capella observations from Chandra. Astronomers already processed the raw data and published a catalog of X-ray spectra. Therefore, I believe that the information in the catalog is validated and ready to be used for scientific data analysis. I heard that repeated Capella observation is for the calibration. Generally speaking, in other fields, targets for calibration are almost time invariant and exhibit consistency. If Capella is a same star over the 10 years, the faces in my post should look almost same, within measurement error; but as you saw, it was not consistent at all. Those faces look like observations were made toward different objects. So far I fail to find any validation efforts, explaining why certain ObsIDs of Capella look different than the rest. Are they real Capella? Or can I use this inconsistent facial expression as an evidence that Chandra calibration at that time is inappropriate? Or can I conclude that Capella was a wrong choice for calibration?

Due to the lack of quantification procedure description from the raw data to the catalog, what I decided to do was accessing the raw data and data processing on my own to crosscheck the validity in the catalog entries. The benefit of this effort is that I can easily manipulate data for further statistical inference. Although reading and processing raw data may sound easy, I came across another problem, lack of documentation for nonspecialists to perform the task.

A while ago, I talked about read.table() in R. There are slight different commands and options but without much hurdle, one can read in ascii data in various styles easily with read.table() for exploratory data analysis and confirmatory data analysis with R. From my understanding, statisticians do not spend much time on reading in data nor collecting them. We are interested in methodology to extract information of the population based on sample. While the focus is methodology, all the frustrations with astronomical data analysis softwares occur prior to investigating the best method. The level of frustration reached to the extend of terminating my eagerness for more investigation about inference tools.

In order to assess those Capella observations, thanks to its on-site help, I evoke ciao. Beforehand, I’d like to disclaim that I exemplify ciao to illustrate the culture difference that I experienced as a statistician. It was used to discuss why I think that astronomical data analysis systems are short of documentations and why that astronomical data processing procedures are lack of validation. I must say that I confront very similar problems when I tried to learn astronomical packages such as IRAF and AIPS. Ciao happened to be at handy when writing this post.

In order to understand X-ray data, not only image data files, one also needs effective area (arf), redistribution matrix (rmf), and point spread function (psf). These files are called by calibration data files. If the package was developed for general users, like read.table() I expect there should be a homogenized/centralized data including calibration data reading function with options. Instead, there were various kinds of functions one can use to read in data but the description was not enough to know which one is doing what. What is the functionality of these commands? Which one only stores names of data file? Which one reconfigures the raw data reflecting up to date calibration file? Not knowing complete data structures and classes within ciao, not getting the exact functionality of these data reading functions from ahelp, I was not sure the log likelihood that I computed is appropriate or not.

For example, there are five different ways to associate an arf: read_arf(), load_arf(), set_arf(), get_arf(), and unpack_arf() from ciao. Except unpack_arf(), I couldn’t understand the difference among these functions for accessing an arf[4] Other softwares including XSPEC that I use, in general, have a single function with options to execute different level of reading in data. Ciao has an extensive web documentation without a tutorial (see my post). So I read all ahelp “commands” a few times. But I still couldn’t decide which one to use for my work to read in arfs and rmfs (I happened to have many calibration data files).

arf rmf psf pha data
get get_arf get_rmf get_psf get_pha get_data
set set_arf set_rmf set_psf set_pha set_data
unpack unpack_arf unpack_rmf unpack_psf unpack_pha unpack_data
load load_arf load_rmf load_psf load_pha load_data
read read_arf read_rmf read_psf read_pha read_data

[Note that above links may not work since ciao documentation website evolves quickly. Some might be routed to different links so please, check this website for other data reading commands: cxc.harvard.edu/sherpa/ahelp/index_alphabet.html].

So, I decide to seek for a help through cxc help desk several months back. Their answers are very reliable and prompt. My question was “what are the difference among read_xxx(), load_xxx(), set_xxx(), get_xxx(), and unpack_xxx(), where xxx can be data, arf, rmf, and psf?” The answer to this question was that

You can find detailed explanations for these Sherpa commands in the “ahelp” pages of the Sherpa website:

http://cxc.harvard.edu/sherpa/ahelp/index_alphabet.html

This is a good answer but a big cultural shock to a statistician. It’s like having an answer like “check http://www.r-project.org/search.html and http://cran.r-project.org/doc/FAQ/R-FAQ.html” for IDL users to find out the difference between read.table() and scan(). Probably, for astronomers, all above various data reading commands are self explanatory like R having read.table(), read.csv(), and scan(). Disappointingly, this answer was not I was looking for.

Well, thanks to this embezzlement, hesitation, and some skepticism, I couldn’t move to the next step of implementing fitting methods. At the beginning, I was optimistic when I found out that Ciao 4.0 and up is python compatible. I thought I could do things more in statistically rigorous ways since I can fake spectra to validate my fitting methods. I was thinking about modifying the indispensable chi-square method that is used twice for point estimation and hypothesis testing that introduce bias (a link made to a posting). My goal was make it less biased and robust, less sensitive iid Gaussian residual assumptions. Against my high expectation, I became frustrated at the first step, reading and playing with data to get a better sense and to develop a quick intuition. I couldn’t even make a baby step to my goal. I’m not sure if it a good thing or not, but I haven’t been completely discouraged. Also, time helps gradually to overcome this culture difference, the lack of documentation.

What happens in general is that, if a predecessor says, use “set_arf(),” then the apprentice will use “set_arf()” without doubts. If you begin learning on your own purely relying on documentations, I guess at some point you have to make a choice. One can make a lucky guess and move forward quickly. Sometimes, one can land on miserable situation because one is not sure about his/her choice and one cannot trust the features appeared after these processing. I guess it is natural to feel curiosity about what each of these commands is doing to your data and what information is carried over to the next commands in analysis procedures. It seems righteous to know what command is best for the particular data processing and statistical inference given the data. What I found is that such comparison across commands is missing in documentations. This is why I thought astronomical data analysis systems are short of mercy for nonspecialists.

Another thing I observed is that there seems no documentation nor standard procedure to create the repeatable data analysis results. My observation of astronomers says that with the same raw data, the results by scientist A and B are different (even beyond statistical margins). There are experts and they have knowledge to explain why results are different on the same raw data. However, not every one can have luxury of consulting those few experts. I cannot understand such exclusiveness instead of standardizing the procedures through validations. I even saw that the data that A analyzed some years back can be different from this year’s when he/she writes a new proposal. I think that the time for recreating the data processing and inference procedure to explain/justify/validate the different results or to cover/narrow the gap could have not been wasted if there are standard procedures and its documentation. This is purely a statistician’s thought. As the comment in where is ciao X?[5] not every data analysis system has to have similar design and goals.

Getting lost while figuring out basics (handling, arf, rmf, psf, and numerous case by case corrections) prior to applying any simple statistics has been my biggest obstacle in learning astronomy. The lack of documenting validation methods often brings me frustration. I wonder if there’s any astronomers who lost in learning statistics via R, minitab, SAS, MATLAB, python, etc. As discussed in where is ciao X? I wish there is a centralized tutorial that offers basics, like how to read in data, how to do manipulate datum vector and matrix, how to do arithmetics and error propagation adequately not violating assumptions in statistics (I don’t like the fact that the point estimate of background level is subtracted from observed counts, random variable when the distribution does not permit such scale parameter shifting), how to handle and manipulate fits format files from Chandra for various statistical analysis, how to do basic image analysis, how to do basic spectral analysis, and so on with references[6]

  1. This is quite an overdue posting. Links and associated content can be outdated.
  2. For the classification purpose, data with clear distinction between response and predictor variables so called a training data set must be given. However, I often fail to get processed data sets for statistical analysis. I first spend time to read data and question what is outlier, bias, or garbage. I’m not sure how to clean and extract numbers for statistical analysis and every sub-field in astronomy have their own way to clean to be fed into statistics and scatter plots. For example, image processing is still executed case by case via trained eyes of astronomers. On the other hand, in medical imaging diagnosis specialists offer training sets with which scientists in computer vision develop algorithms for classification. Such collaboration yields accelerated, automatic but preliminary diagnosis tools. A small fraction of results from these preliminary methods still can be ambiguous, i.e. false positive or false negative. Yet, when such ambiguous cancerous cell images at the decision boundaries occur, specialists like trained astronomers scrutinize those images to make a final decision. As medical imaging and its classification algorithms resolve the issue of expert shortage under overflowing images, I wish astronomers adopt their strategies to confront massive streaming images and to assist sparse trained astronomers
  3. Something I like to see is handling background statistically in high energy astrophysics. When simulating a source, background can be simulated as well via Makov Random field, kriging, and other spatial statistics methods. In reality, background is subtracted once in measurement space and the random nature of background is not interactively reflected. Regardless of available statistical methodology to reflect the nature of background, it is difficult to implement it for trial and validation because those tools are not compatible for adding statistical modules and packages.
  4. A Sherpa expert told me there is an FAQ (I failed to locate previously) on this matter. However, from data analysis perspective like a distinction between data.structure, vector, matrix, list and other data types in R, the description is not sufficient for someone who wants to learn ciao and to perform scientific (both deterministic or stochastic) data analysis via scripting i.e. handling objects appropriately. You might want to read comparing commands in Sharpa from Shepa FAQ
  5. I know there is ciaox. Apart from the space between ciao and X, there is another difference that astronomers do not care much compared to statisticians: the difference between X and x. Typically, the capital letter is for random variable and lower case letters for observation or value
  6. By the way, there are ciao workshop materials available that could function as tutorials. Please, locate them if needed.
]]>
http://hea-www.harvard.edu/AstroStat/slog/2009/data-analysis-system-and-its-documentation/feed/ 0
To Become a Good Astronomer http://hea-www.harvard.edu/AstroStat/slog/2009/to-become-a-good-astronomer/ http://hea-www.harvard.edu/AstroStat/slog/2009/to-become-a-good-astronomer/#comments Fri, 25 Sep 2009 18:24:33 +0000 hlee http://hea-www.harvard.edu/AstroStat/slog/?p=3635 By accident, a piece of paper was found from my old text book. I have no idea who wrote this, nor how old it is. Too old to be obsolete? But it has general description to become a good person and scientist

To Become a Good Astronomer

  1. Work on scientific problems is driven by many things, but the most important is a passion for science.
  2. Learn lots of physics. Always work on problems that are a bit harder than you think you can do. Also, hone your skills at mathematics, computing, and hardware.
  3. Know yourself very well. For people who belong to groups which are routinely discriminated against, it’s very important when one encounters difficulties (and successes, too) to be able to do a reasonable job of disentangling one’s own distribution from those of others.
  4. Behave decently, honestly, and with civility. You’ll encounter a lot of the opposite. Think very carefully about what you’re doing before you do it. (Some) aggressiveness and self-confidence are good things to have, and you should for instance not hesitate to insist that promises which have been made to you be kept. However, be careful not to let these qualities spill over into hostility and negative attitudes toward your colleagues.

A missing word in the second is statistics: hone your skills at math., stat., comp., and hardware. Perhaps, there is a version for a good statistician but I doubt that astronomy belongs to the list of skills.

]]>
http://hea-www.harvard.edu/AstroStat/slog/2009/to-become-a-good-astronomer/feed/ 0
2010 SBSS STUDENT PAPER COMPETITION http://hea-www.harvard.edu/AstroStat/slog/2009/sbss-2010-competition/ http://hea-www.harvard.edu/AstroStat/slog/2009/sbss-2010-competition/#comments Fri, 21 Aug 2009 03:14:46 +0000 chasc http://hea-www.harvard.edu/AstroStat/slog/?p=3423 The Section on Bayesian Statistical Science (SBSS) of the American Statistical Association (ASA) would like to announce its 2010 student paper competition.  Winners of the competition will receive partial support for attending the 2010 Joint Statistical Meetings (JSM) in Vancouver, BC.

Eligibility

The candidate must be a member of SBSS (URL: www.amstat.org/membership/chapsection.pdf) or ISBA (International Society for Bayesian Analysis). Those candidates who have previously received travel support from SBSS are not eligible to participate. In addition, the candidate must be a full-time student (undergraduate, Masters, or Ph.D.) on or after September 1, 2009.

A manuscript, suitable for journal submission, is required for entry. The candidate must be the lead author on the paper, and hold the primary responsibility for the research and write-up.

The candidate must have separately submitted an abstract for JSM 2010 through the regular abstract submission process,  to present applied, computational, or theoretical Bayesian work. Papers should be submitted for presentation at the JSM as topic contributed or invited papers. Those papers not already a part of a session should be submitted online using the following settings:

(at URL: www.amstat.org/meetings/jsm/2010/index.cfm?fuseaction=abstracts):

* Abstract Type: Topic contributed
* Sub Type: Papers
* Sponsor: Section on Bayesian Statistical Science
* Organizer:  Alyson Wilson
* Organizer e-mail: agw -at- iastate.edu

Application Process

The deadline for application is Feb. 1 (same as the JSM 2010 abstract submission deadline). A formal application including the following materials should be emailed to Prof. Vanja Dukic (vanja -at- uchicago.edu):

a)      CV
b)      Abstract number (from the ASA JSM 2010 abstract submission)
c)      Letter from the major professor (advisor) or faculty co-author, verifying the student status of the candidate, and briefly describing the candidate’s role in the research and writing of the paper
d)      The manuscript, suitable for journal submission, in .pdf format.

Selection of Winners

Papers will be reviewed by a committee determined by the officers of the SBSS. Criteria for selection will include, but are not limited to, significance and potential impact of the research.  Decisions of the committee are final, and will be announced in the Spring before the JSM.

Prizes

Prizes will consist of a certificate to be presented at the SBSS section meeting and partial support (up to $1000) for attending the JSM.  Please note that the awards may be unable to cover the entirety of any winner’s travel, so winning candidates may need to supplement the SBSS award with other funds. To receive a monetary prize, the winner will need to provide proof of membership and submit travel receipts to the SBSS treasurer after the JSM.

]]>
http://hea-www.harvard.edu/AstroStat/slog/2009/sbss-2010-competition/feed/ 0
Magic Crystal http://hea-www.harvard.edu/AstroStat/slog/2009/magic-crystal/ http://hea-www.harvard.edu/AstroStat/slog/2009/magic-crystal/#comments Thu, 13 Aug 2009 14:52:42 +0000 hlee http://hea-www.harvard.edu/AstroStat/slog/?p=3326 Over the few years, at the heart of astronomical researches, I see astronomers treat statistics like a magic crystal.

Like a magic crystal, statistics

  • is a easy way to tell (future) results. However, astronomers do not need to know how it works.
  • can give a wrong prediction. Well, what can we expect from a magic crystal (statistics)? One is rather not trust it with 100% confidence. Even if I used to for this unexpected inconsistent result, this magic crystal has been performing decent jobs previously. As one expects, a magic crystal can say wrong stuffs from time to time.
  • A magic crystal does not take a long time to tell the future. Waiting in a line for the fortune teller or the decision making process of which fortune teller to visit requires more time.
  • Instead of modifying and redesigning the magic crystal in order to get statistically valid results, astronomers often attempt testing various crystals and with different questions (brute force Monte Carlo and various types of binning for the chi-square method, for example).

Some astronomers know statistics is not a magic crystal but a science covering broad needs. They understand statistics requires data matching assumptions to be fully utilized. As astronomers spend years to build telescope and to process data, statisticians also need time to brood on mathematical theories and modeling procedures. I hope the other astronomers, who think statistics as a magic crystal, consider consulting statisticians and literature of other science fields with statistical methodology (like meteorology, where spatial temporal models and stochastic processes are common). Often times statisticians have to develop new methodology because astronomical data are quite different from those of bio and medical science. It takes long until the mutual understanding between statisticians and astronomers on the data is firmly built. As if astronomers are not astrologers, although I met people who cannot tell the difference, statisticians are not fortune tellers. Think about economists. I hope that astronomers listen statisticians’ ready built never used crystal although protocols of design experiments do not match astronomers’ data and their requests at the beginning (time series data that I mentioned, for example. I believe the data were presented different ways for more statistically rigorous analysis).

I don’t want to fall in any kind of generalization fallacies such as a comment by an astronomer Bayesian is robust but frequentist is not[1] by saying that “statistics is not a magic crystal. ” Nonetheless, I want astronomers to investigate what is really in those magic crystals they often use. I want them to attempt to break the crystal to know what it is and to change the composition of the magic crystals to please clients (astronomers yourself). Gladly, there are some who put extra efforts for that purpose. Otherwise, be patient with your fortune teller/psychologist to design the right crystal/tools for you instead of treating statistics like a magic crystal.

  1. Bayesian can be robust than frequentist methods because of its flexibility of hierarchical modeling but it is not a sufficient condition for being robust. Such statement hurt me a lot and I suddenly felt a sympathy to fortune teller who relies on his/her crystal and all of a sudden the client expresses a great deal of mistrust because the magic crystal is so mystic and uninspectable
]]>
http://hea-www.harvard.edu/AstroStat/slog/2009/magic-crystal/feed/ 0
Yes we can http://hea-www.harvard.edu/AstroStat/slog/2009/yes-we-can/ http://hea-www.harvard.edu/AstroStat/slog/2009/yes-we-can/#comments Fri, 07 Aug 2009 19:29:54 +0000 vlk http://hea-www.harvard.edu/AstroStat/slog/?p=3335 From a poem submitted to the Chinese National Bureau of Statistics:

因为有了统计
我可以把天上的星星重新梳理

Because of statistics
I can rearrange the stars in the skies above

Indeed. Especially so when the PSF is broad and the stars overlap.

(via)

]]>
http://hea-www.harvard.edu/AstroStat/slog/2009/yes-we-can/feed/ 1
Where is ciao X ? http://hea-www.harvard.edu/AstroStat/slog/2009/where-is-ciao-x/ http://hea-www.harvard.edu/AstroStat/slog/2009/where-is-ciao-x/#comments Thu, 30 Jul 2009 06:57:00 +0000 hlee http://hea-www.harvard.edu/AstroStat/slog/?p=3260 X={ primer, tutorial, cookbook, Introduction, guidebook, 101, for dummies, …}

I’ve heard many times about the lack of documentation of this extensive data analysis system, ciao. I saw people still using ciao 3.4 although the new version 4 has been available for many months. Although ciao is not the only tool for Chandra data analysis, it was specifically designed for it. Therefore, I expect it being used frequently with popularity. But the reality is against my expectation. Whatever (fierce) discussion I’ve heard, it has been irrelevant to me because ciao is not intended for statistical analysis. Then, out of sudden, after many months, a realization hit me. ciao is different from other data analysis systems and softwares. This difference has been a hampering factor of introducing ciao outside the Chandra scientist community and of gaining popularity. This difference was the reason I often got lost in finding suitable documentations.

http://cxc.harvard.edu/ciao/ is the website to refer when you start using ciao and manuals are listed here, manuals and memos. The aforementioned difference is that I’m used to see Introduction, Primer, Tutorial, Guide for Beginners at the front page or the manual websites but not from the ciao websites. From these introductory documentations, I can stretch out to other specific topics, modules, tool boxes, packages, libraries, plug-ins, add-ons, applications, etc. Tutorials are the inertia of learning and utilizing data analysis systems. However, the layout of ciao manual websites seems not intended for beginners. It was hard to find basics when some specific tasks with ciao and its tools got stuck. It might be useful only for Chandra scientists who have been using ciao for a long time as references but not beyond. It could be handy for experts instructing novices by working side by side so that they can give better hands-on instruction.

I’ll contrast with other popular data analysis systems and software.

  • When I began to use R, I started with R manual page containing this pdf file, Introduction to R. Based on this introductory documentations, I could learn specific task oriented packages easily and could build more my own data analysis tools.
  • When I began to use Matlab, I was told to get the Matlab primer. Although the current edition is commercial, there are free copies of old editions are available via search engines or course websites. There other tutorials are available as well. After crashing basics of Matlab, it was not difficult to getting right tool boxes for topic specific data analysis and scripting for particular needs.
  • When I began to use SAS (Statistical Analysis System), people in the business said get the little SAS book which gives the basis of this gigantic system, from which I was able to expend its usage for particular statistical projects.
  • Recently, I began to learn Python to use many astronomical and statistical data analysis modules developed by various scientists. Python has its tutorials where I can point for basic to fully utilize those task specific modules and my own scripting.
  • Commericial softwares often come with their own beginners’ guide and demos that a user can follow easily. By acquiring basics from these tutorials, expending applications can be well directed. On the other hands, non-commercial softwares may be lack of extensive but centralized tutorials unlike python and R. Nonetheless, acquiring tutorials for teaching is easy and these unlicensed materials are very handy whenever problems are confronted under various but task specific projects.
  • I used to have IDL tutorials on which I relied a lot to use some astronomy user libraries and CHIANTI (atomic database). I guess the resources of tutorials have changed dramatically since then.

Even if I’ve been navigating the ciao website and its threads high in volume so many times, I only come to realize now that there’s no beginner’s guide to be called as ciao cookbook, ciao tutorial, ciao primer, ciao primer, ciao for dummies, or introduction to ciao at the visible location.

This is a cultural difference. Personal thought is that this tradition prevents none Chandra scientists from using data in the Chandra archive. A good news is that there has been ciao workshops and materials from the workshops are still available. I believe compiling these materials in a fashion that other beginners’ guides introducing the data analysis system can be a good starting point for writing up a front-page worthy tutorial. The existence of this introductory material could embrace more people to use and to explore Chandra X-ray data. I hope these tutorials from other softwares and data analysis systems (primer, cookbook, introduction, tutorial, or ciao for dummies) can be good guide lines to fully compose a ciao primer.

]]>
http://hea-www.harvard.edu/AstroStat/slog/2009/where-is-ciao-x/feed/ 1