The AstroStat Slog » grammar http://hea-www.harvard.edu/AstroStat/slog Weaving together Astronomy+Statistics+Computer Science+Engineering+Intrumentation, far beyond the growing borders Fri, 09 Sep 2011 17:05:33 +0000 en-US hourly 1 http://wordpress.org/?v=3.4 Datums http://hea-www.harvard.edu/AstroStat/slog/2009/datums/ http://hea-www.harvard.edu/AstroStat/slog/2009/datums/#comments Mon, 04 May 2009 17:36:17 +0000 vlk http://hea-www.harvard.edu/AstroStat/slog/?p=1424 For someone who doesn’t know any grammar, I can be a bit of a Grammar nazi sometimes. And one of my pet peeves is when people use the word data in the singular. No! Data are!

Or so I used to believe.

But recently I came across possibly the most sensible compromise between the “is” and the “are” crowd, articulated by the Grammar Girl, Mignon Fogarty.

The compromise hinges on the hair-splitting difference between the so-called count nouns and mass nouns. Count nouns are those which you can count (e.g., I have five books, you have 75 CDs, he got 50 million votes). You have more or fewer of them, and you may say you have many of them. Mass nouns are those which cannot be made plural and cannot be used to count (e.g., you don’t ask for two cups of coffees, or two pieces of chalks). You have more or less of it, and you use it in a sentence with much, as in “how much coffee would you like”. (Of course, English being what it is, you can also say “how many coffees are you ordering”, which actually is shorthand for “how many cups of coffee …” and is thus implicitly pointing to a count noun.)

An easy way to tell these two types of nouns apart is to ask yourself how many or how much. If it makes sense to ask how many there are of a noun, as in how many cars or how many people, then it’s a count noun. If, however, it makes more sense to ask how much there is of a noun, as in how much butter or how much rain, then it’s a mass noun.

The use of many and much parallels the use of fewer and less: many and fewer are used with count nouns (like items in a grocery cart) and much and less are used with mass nouns, like tea or bacon.

The trick now is to realize that data can be both a count noun and a mass noun. If you use it as a count noun, it is always plural, and you are using it in lieu of the word “facts”; the literal translation of datum from the original Latin is that it is “a thing given”, hence data is “things given” — it refers to a quantity. If instead you are using it in lieu of the word “information”, that makes it a mass noun, and it becomes singular. The facts are compelling uses it as a count noun, but the information is compelling uses it as a mass noun.

The count noun datum and its plural data, meaning “a given fact or assumption,” were adopted from Latin into English by the seventeenth century (2); however, it wasn’t till the late nineteenth century that data took on the modern sense of facts and figures. This shift in meaning also led some to start treating data as a mass noun.

She goes on to give some good advice –

So if data is correct as both a count noun and as a mass noun, which should you use? It comes down to style and personal preference. Many academic and scientific fields, as well as many publishers and newspapers, still insist on the plural count noun use of data

Just be aware that if you do write or edit for a publisher or in a discipline that insists on plural data, you should make sure the surrounding words properly reflect the plural treatment of the word data. Even if you don’t have a style guide insisting on the plural usage but you decide to use it anyway because you like Latin plurals, be sure to do it consistently throughout the document — in other words, don’t mix up your datas, using it as a count noun in one place and as a mass noun in another.

I, for one, am willing to accept a ceasefire in the data wars.

]]>
http://hea-www.harvard.edu/AstroStat/slog/2009/datums/feed/ 0
[Book] The Grammar of Graphics http://hea-www.harvard.edu/AstroStat/slog/2008/book-the-grammar-of-graphics/ http://hea-www.harvard.edu/AstroStat/slog/2008/book-the-grammar-of-graphics/#comments Wed, 08 Oct 2008 23:55:37 +0000 hlee http://hea-www.harvard.edu/AstroStat/slog/?p=260 All of a sudden, partially owing to a thought provoking talk about visualization by Felice Frankel at IIC, I recollected a book, The Grammar of Graphics by Leland Wilkinson (2nd Ed. – I partially read the 1st ed. and felt little of use several years ago because there seemed no link for visualization of data from astronomy.)

Both good and bad reviews exist but I don’t believe there’s a book this extensive to cover the grammar of graphics. Not many statisticians are handling images compared to computer vision engineers but at some points, all engineers and scientists must present their work into graphs and tables. By the same token, tongs are different, although alphabets are common. Often times, plots from scientist A cannot talk to scientist B (A \ne B). This communication discrepancy seems prevalent between astronomy and statistics.

Almost all chapters begin with the Greek or Latin origins of chapter names to reflect the common origins of lexicons in graphics regardless of subjects. Some chapters, on the contrary, tend to illuminate different practices/perspectives/interests in graphics between astronomers and statisticians:

  • Chap. 6 [Scale]: Scaling by log transformation is meant to stabilize errors (Box-Cox transformation) in statistics; in contrast, in astronomy to impose a linear relationship between predictor and response which is manifested better in log scale.
  • Chap. 7 [Statistics]: Discussion on error bars, bins, and histogram; although graphical tools are same but the objectives seem different (statistics – optimal binning: astronomy – enhancing signals in each bin).
  • Chap 15. [Uncertainty]: Concepts of uncertainty; many words are associated with uncertainty, for example, variability, noise, incompleteness, indeterminacy, bias, error, accuracy, precision, reliability, validity, quality, and integrity.

Overall, the ideas are implored to be included adaptively in the astronomical data analysis packages for visualizing the analyzed products. Perhaps, it may inspire some astronomers to transform the ways of visualization. For instance, instead of histograms, in my opinion, box-plots, qq-plots, and scatter plots would shed improved information while maintaining the simplicity but except scatter plots, other summary plots are not commonly used in astronomy. A benefit from box plot and qq plot is checking gaussianity without sacrificing information from binning. However, there’s no golden rule which type or grammar of graphics is correct and shall be used . Only exists user preference.

Different disciplines maintain their ways of presenting graphics and expect that they can talk to viewers of other disciplines. No one fully reached that point, disappointingly. Extensive discussion and persuasion is required to deliver stories behind graphics to others.

As Felice Frankel pointed out the way of visualization could enhance recognition and understanding of deliberate delivering of information. To the purpose, a few interesting quotes from the book is replaced the conclusion of this post.

  • The first ed. of this book, and Part 1 of the current ed., explicitly cautioned that the grammar of graphics is not a visualization system.
  • We are surprised, nevertheless, to discover how little some visualization researchers in various fields know about the origins of many the of techniques that are routinely applied in visualization.
  • The grammar of graphics determined how algebra, geometry, aesthetics, statistics, scales, and coordinates interact. In the world of statistical graphics, we cannot confuse aesthetics with geometry by picking a tree graphics to represent a continuous flow of migrating insects across a geographic field simply because we like the impression in conveys.
  • If we must choose a single word to characterize the focus of modern statistics, it would be uncertainty (Stigler, 1983)
  • … decision-makers need statistical tools to formalize the scenarios they encounter and they need graphical aids to keep them from making irrational decisions.the use of graphics for decision-making under uncertainty is a relatively recent field.We need to go beyond the use of error bars to incorporate other aesthetics in the representation of error. And we need research to assess the effectiveness of decision-making based on these graphics using a Bayesian yardstick.


]]>
http://hea-www.harvard.edu/AstroStat/slog/2008/book-the-grammar-of-graphics/feed/ 0