[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<n.j.cox@durham.ac.uk>, <statalist@hsphsun2.harvard.edu> |

Subject |
RE: Age groups [was: st: RE: RE: Question about tabsort ... ] |

Date |
Fri, 12 Jul 2002 18:42:05 +0100 |

[Apologies for previous premature reply.] > Rodrigo Briceņo" <rbriceno@sanigest.com> > > 2. Months ago I asked to the list how can I generate some age groups. > I use that help in order to have a frequency of the hospital discharges > by age groups. The problem here is that I found that my variable age > was a string8 variable. Then the options to generate a new variable > was restricted: I tried typing: > > gen str8 rank_edad=1 if inrange(edad,0,0) > > to construct my age groups (less than 1 year, between 1 and 4, > between 5 and 9, 10-19, 20-29, 30-39, 40-49, 50-59 and the last > one 60 or more) That's not going to work as you typed it if only because strings need to be in " " and you can't apply -inrange()- to a string variable. > This not work for me so I try: > > encode edad, gen (edad2) > > like I learned from my net course on Stata. > The things appear to be ok, because when I type > > tab edad2 > > I see all the ages correctly. The problem here is that > I'm building three groups (the first three) that donīt have data > on it. But Stata is calculating something on it, I don't know why. > > gen byte rank_edad=1 if range(edad2,0,0) > replace rank_edad=2 if inrange (edad2,1,4) > replace rank_edad=3 if inrange (edad2,5,9) > replace rank_edad=4 if inrange (edad2,10,19) > and so on..... > > tab rank_edad > > rank_edad | Freq. Percent Cum. > ------------+----------------------------------- > 1-4_anos | 90 1.73 1.73 > 5-9_anos | 1190 22.88 24.61 > 10-19_anos | 2114 40.64 65.24 > 20-29_anos | 892 17.15 82.39 > 30-39_anos | 304 5.84 88.24 > 40-49_anos | 202 3.88 92.12 > 50-59_anos | 162 3.11 95.23 > Mas_60_anos | 248 4.77 100.00 > ------------+----------------------------------- > Total | 5202 100.00 > > Do you know or guess why Stata is putting data on the first three > groups that is supposed to be empty (I build those groups because > I'm making a do file, that I can apply to other databases). I am far from clear about everything you have done, for example, on how -rank_edad- got its value labels. One puzzle is that you are generating -rank_edad- from the _encoded_ variable -edad2- which is _not_ age but age categories. tab edad2, nola will show you, I think, that you just have age categories 1, 2, 3, etc. The other detail which may help is to note that Stata's default encoding is on alphanumeric order. Amazingly, something that I am writing at the moment discusses sorting of age intervals and, possibly, one of your problems. If you give Stata these string values to -sort- "1-4_anos" "5-9_anos" "10-19_anos" "20-29_anos" "30-39_anos" "40-49_anos" "50-59_anos" "Mas_60_anos" you will get "1-4_anos" "10-19_anos" "20-29_anos" "30-39_anos" "40-49_anos" "5-9_anos" "50-59_anos" "Mas_60_anos" because -sort- of strings is on dictionary principles and characters are put in ASCII order with no reference to their meaning. By default -encode- will use this order to assign categories. That is, "1-4_anos" is -encode-d as 1, etc. Something like this may help to explain some of your results. The remedy is to define your own value labels and to insist that -encode- use them. Nick n.j.cox@durham.ac.uk * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**Age groups [was: st: RE: RE: Question about tabsort ... ]***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

- Prev by Date:
**st: two-step Tobit** - Next by Date:
**st: return the name of the variable of greater rank** - Previous by thread:
**Age groups [was: st: RE: RE: Question about tabsort ... ]** - Next by thread:
**st: two-step Tobit** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |