The AstroStat Slog » data

An Instructive Challenge

vlk — Tue, 15 Jun 2010 18:38:56 +0000

This question came to the CfA Public Affairs office, and I am sharing it with y’all because I think the solution is instructive.

A student had to figure out the name of a stellar object as part of an assignment. He was given the following information about it:

apparent [V] magnitude = 5.76
B-V = 0.02
E(B-V) = 0.00
parallax = 0.0478 arcsec
radial velocity = -18 km/s
redshift = 0 km/s

He looked in all the stellar databases but was unable to locate it, so he asked the CfA for help.

Just to help you out, here are a couple of places where you can find comprehensive online catalogs:

See if you can find it!

Answer next ~~week~~ month.

Update (2010-aug-02):
The short answer is, I could find no such star in any commonly available catalog. But that is not the end of the story. There does exist a star in the Hipparcos catalog, HIP 103389, that has approximately the right distance (21 pc), radial velocity (-16.1 km/s), and V magnitude (5.70). It doesn’t match exactly, and the B-V is completely off, but that is the moral of the story.

The thing is, catalogs are not perfect. The same objects often have very different numerical entries in different catalogs. This could be due to a variety of reasons, such as different calibrations, different analysers, or even intrinsic variations in the source. And you can bet your bottom dollar that the quoted statistical uncertainties in the quantities do not account for the observed variance. Take the B-V value, for instance. It is 0.5 for HIP 103389, but the initial problem stated that it was 0.02, which makes it an A type star. But if it were an A type star at 21 pc, it should have had a magnitude of V~1.5, much brighter than the required 5.76!

I think this illustrates one of the fundamental tenets of science as it is practiced, versus how it is taught. The first thing that a practicing scientist does (especially one not of the theoretical persuasion) is to try and see where the data might be wrong or misleading. It should only be included in analysis after it passes various consistency checks and is deemed valid. The moral of the story is, don’t trust data blindly just because it is a “number”.

]]>

Datums

vlk — Mon, 04 May 2009 17:36:17 +0000

For someone who doesn’t know any grammar, I can be a bit of a Grammar nazi sometimes. And one of my pet peeves is when people use the word data in the singular. No! Data are!

Or so I used to believe.

But recently I came across possibly the most sensible compromise between the “is” and the “are” crowd, articulated by the Grammar Girl, Mignon Fogarty.

The compromise hinges on the hair-splitting difference between the so-called count nouns and mass nouns. Count nouns are those which you can count (e.g., I have five books, you have 75 CDs, he got 50 million votes). You have more or fewer of them, and you may say you have many of them. Mass nouns are those which cannot be made plural and cannot be used to count (e.g., you don’t ask for two cups of coffees, or two pieces of chalks). You have more or less of it, and you use it in a sentence with much, as in “how much coffee would you like”. (Of course, English being what it is, you can also say “how many coffees are you ordering”, which actually is shorthand for “how many cups of coffee …” and is thus implicitly pointing to a count noun.)

An easy way to tell these two types of nouns apart is to ask yourself how many or how much. If it makes sense to ask how many there are of a noun, as in how many cars or how many people, then it’s a count noun. If, however, it makes more sense to ask how much there is of a noun, as in how much butter or how much rain, then it’s a mass noun.

The use of many and much parallels the use of fewer and less: many and fewer are used with count nouns (like items in a grocery cart) and much and less are used with mass nouns, like tea or bacon.

The trick now is to realize that data can be both a count noun and a mass noun. If you use it as a count noun, it is always plural, and you are using it in lieu of the word “facts”; the literal translation of datum from the original Latin is that it is “a thing given”, hence data is “things given” — it refers to a quantity. If instead you are using it in lieu of the word “information”, that makes it a mass noun, and it becomes singular. The facts are compelling uses it as a count noun, but the information is compelling uses it as a mass noun.

The count noun datum and its plural data, meaning “a given fact or assumption,” were adopted from Latin into English by the seventeenth century (2); however, it wasn’t till the late nineteenth century that data took on the modern sense of facts and figures. This shift in meaning also led some to start treating data as a mass noun.

She goes on to give some good advice –

So if data is correct as both a count noun and as a mass noun, which should you use? It comes down to style and personal preference. Many academic and scientific fields, as well as many publishers and newspapers, still insist on the plural count noun use of data …

Just be aware that if you do write or edit for a publisher or in a discipline that insists on plural data, you should make sure the surrounding words properly reflect the plural treatment of the word data. Even if you don’t have a style guide insisting on the plural usage but you decide to use it anyway because you like Latin plurals, be sure to do it consistently throughout the document — in other words, don’t mix up your datas, using it as a count noun in one place and as a mass noun in another.

I, for one, am willing to accept a ceasefire in the data wars.

]]>

model vs model

vlk — Fri, 05 Oct 2007 17:38:23 +0000

As Alanna pointed out, astronomers and statisticians mean different things when they say “model”. To complicate matters, we have also started to use another term called “data model”.

First, there is the physical model, which could mean either our understanding of what processes operate on a source (the physics part, usually involving PDEs), or the mathematical function that describes the emission as a function of observables like location, time, or energy (the astronomy part, usually the shape of the spectrum, or the time evolution in a light curve, etc.)

The data model on the other hand describes the organization of the observation. It is this which tells us that there is a fundamental difference between an effective area and a response matrix, and conversely, that the point spread function and the line response function are the same beast. This kind of thing, which I suppose is a computer science oriented view of the contents of a file, is crucial for implementing and running something like the Virtual Observatory.

]]>