Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: RE: RE: RE: generating a new variable with the egen command

From   "Nick Cox" <>
To   <>
Subject   st: RE: RE: RE: RE: generating a new variable with the egen command
Date   Mon, 28 Nov 2005 17:51:44 -0000

I agree with the main point, very strongly. Indeed Aristotle and Gauss
said the same, so Maarten stands in a good line. But these discussions
can, secondarily, be muddied, and muddled, by terminology. Maarten
starts out by talking about accuracy, but then he changes terminology to
precision, so it is not clear whether he is meaning the same thing, or
something else. 

I'd distinguish four concepts, and others might want to go further. 

1. A first question is how data are recorded or reported, which does not
necessarily involve any claim on accuracy or precision.  Thus a common
convention in meteorology is to record temperatures to 0.1 deg C or F. A
common convention in demography is to report a census population with no
rounding, so that up to 10 digits may be given. (I know that a census in
practice is usually another kind of estimate, not the main point here.)
In the first case, the tacit view is that we could use better technology
to get more digits, but that would usually be not only too expensive but
rather silly, as any intervention changes the temperature anyway and the
temperature 1 metre away is different. In the second case I presume that
no demographer knowing about the conduct of censuses expects the census
figure to be accurate to anything like the number of digits reported,
but for all sorts of other reasons rounding of census results appears
taboo. I like to think of this as the "resolution" of the data, but I
doubt that is a standard term. 

2. Accuracy at least I think in the physical sciences implies closeness
to some 'true' or 'real' or 'correct' value, which is variously (a)
philosophically problematic to many, (b) practically difficult in the
(usual) absence of any notion of what that value is, or even of a "gold
standard" measurement, that is a measurement produced by the best method
available (a term less common in economics these days than in
medicine?). It seems that when many social scientists talk about
validity, they mean this, or something close to it. 

3. Precision in the statistical sense implies uncertainty as indicated,
ideally, by variability of repetitions (unless you are a Bayesian). It
seems that when many social scientists talk about reliability, they mean
this, or something close to it. 

4. Precision in the computing sense refers to how a number is held
internally, and to how results depend on the details of calculations.
Thus binary-based machines, and I don't use any other, have to struggle
with holding 0.1. In a strict sense, they can't do it! 

These seem four different senses, but are nevertheless often confused. I
was brought up on the analogy of repeatedly aiming at a bull's eye target (e.g. by
firing several arrows or bullets), accuracy being how close you are to 
the target on average and precision being how tightly your hits cluster,
but every year when I "remind" students of what I hope they already know
I get many blank stares. Be that as it may, in most sciences we don't
know where the bull's eye is. 


P.S. You probably know the story of someone firing at a wall and then
painting a target around their hits. This is a good little joke, except
that quite a lot of science is like this, in my own field too. 

Maarten Buis
> Since precision came up again, I would like to add a comment:
> While it is good practice to try to minimize rounding errors 
> during computation (e.g. during computing new variables that 
> are sums), you should keep in mind how precise your 
> measurement on that variable actually is. For instance, I 
> teach introductory statistics to first year social science 
> students. In the Netherlands grades run from 0 (didn't even 
> spell their own name right) to 10 (brilliant). Each year at 
> least one of them asks whether I would want to give them 
> grades with two decimal points accuracy. I think I make good 
> exams, but they cannot distinguish between a student with a 
> statistics capability worth a 6.01 and worth a 6.02. 
> Eight digits accurate (float) should be more than enough for 
> most measurements; in most real data I would consider sixteen 
> digits (double) overkill.

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index