Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: Age groups [was: st: RE: RE: Question about tabsort ... ]


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <n.j.cox@durham.ac.uk>, <statalist@hsphsun2.harvard.edu>
Subject   RE: Age groups [was: st: RE: RE: Question about tabsort ... ]
Date   Fri, 12 Jul 2002 18:42:05 +0100

[Apologies for previous premature reply.]

> Rodrigo Briceņo" <rbriceno@sanigest.com>
>
> 2.  Months ago I asked to the list how can I generate some age groups.
> I use that help in order to have a frequency of the hospital discharges
> by age groups. The problem here is that I found that my variable age
> was a string8 variable. Then the options to generate a new variable
> was restricted: I tried typing:
>
> gen str8 rank_edad=1 if inrange(edad,0,0)
>
> to construct my age groups (less than 1 year, between 1 and 4,
> between 5 and 9, 10-19, 20-29, 30-39, 40-49, 50-59 and the last
> one 60 or more)

That's not going to work as you typed it if only because
strings need to be in " " and you can't apply -inrange()-
to a string variable.

> This not work for me so I try:
>
> encode edad, gen (edad2)
>
> like I learned from my net course on Stata.
> The things appear to be ok, because when I type
>
> tab edad2
>
> I see all the ages correctly. The problem here is that
> I'm building three groups (the first three) that donīt have data
> on it.  But Stata is calculating something on it, I don't know why.
>
> gen byte rank_edad=1 if range(edad2,0,0)
> replace rank_edad=2 if inrange (edad2,1,4)
> replace rank_edad=3 if inrange (edad2,5,9)
> replace rank_edad=4 if inrange (edad2,10,19)
> and so on.....
>
> tab  rank_edad
>
>   rank_edad |      Freq.     Percent        Cum.
> ------------+-----------------------------------
>    1-4_anos |         90        1.73        1.73
>    5-9_anos |       1190       22.88       24.61
>  10-19_anos |       2114       40.64       65.24
>  20-29_anos |        892       17.15       82.39
>  30-39_anos |        304        5.84       88.24
>  40-49_anos |        202        3.88       92.12
>  50-59_anos |        162        3.11       95.23
> Mas_60_anos |        248        4.77      100.00
> ------------+-----------------------------------
>       Total |       5202      100.00
>
> Do you know or guess why Stata is putting data on the first three
> groups that is supposed to be empty (I build those groups because
> I'm making a do file, that I can apply to other databases).

I am far from clear about everything you have done,
for example, on how -rank_edad- got its value labels.

One puzzle is that you are generating -rank_edad-
from the _encoded_ variable -edad2- which is _not_
age but age categories.

tab edad2, nola

will show you, I think, that you just have age categories 1, 2, 3,
etc.

The other detail which may help is to note that Stata's
default encoding is on alphanumeric order. Amazingly,
something that I am writing at the moment discusses
sorting of age intervals and, possibly, one of your
problems.

If you give Stata these string values to -sort-

"1-4_anos"
"5-9_anos"
"10-19_anos"
"20-29_anos"
"30-39_anos"
"40-49_anos"
"50-59_anos"
"Mas_60_anos"

you will get

"1-4_anos"
"10-19_anos"
"20-29_anos"
"30-39_anos"
"40-49_anos"
"5-9_anos"
"50-59_anos"
"Mas_60_anos"

because -sort- of strings is on dictionary principles
and characters are put in ASCII order with no
reference to their meaning. By default -encode-
will use this order to assign categories.
That is, "1-4_anos" is -encode-d as 1, etc.
Something like this may help to explain some
of your results. The remedy is to define your
own value labels and to insist that -encode- use them.

Nick
n.j.cox@durham.ac.uk

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index