Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: reshape question


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: reshape question
Date   Fri, 15 Nov 2002 17:38:47 -0000

Traci A Schlesinger
>
> I have census data that is organized so that there is an
> observation for each
> race-sex-year category in a state.  In other words, there
> is one observation for
> white men in Alabama in 1981, one for white women in
> Alabama 1981, etc., etc.
> Further, there are separate variables for several age
> groups.  The data looks like
> this.
>
> Fips     Year     Race     Sex     age04      age59 . . .
>   age8084     age8500
> 1            1            1          1     101151    102545
>        12032
> 12032
>
> where fips are state fips codes, year is the last digit in
> the year (the data only
> spans 1981 to 1989), race is a categorical variable, sex a
> dummy, and the number
> in age04 is the number of (in this case) white boys aged 0
> - 4 in Alabama in 1981
> (the number in age8500 is the number of white men over 85
> in Alabama in 1981).
>
> What i want is to reshape the age long, so that i have an
> observation for each
> individual in the sample.  Thus, I would have 101151
> observations of white men in
> Alabama in 1981.
>
> i tried:
>
> reshape long age, i( fips year race sex)
>
> but this does not work.  it creates an age variable that
> has the values that were
> in each age variable, rather than an observation for each
> of the individuals
> counted in each age group.  of course, this means the race
> and sex counts are also
> not correct.  How do i get what I am looking for?  Do i
> need to generate a
> different age variable first?  Any advise would be appreciated!
>

You're most of the way there.

First, when I tried this, I had to go

. l

           Fips        Year        Race         Sex       age04
age59     age8084     age8500
  1.          1           1           1           1      101151
102545       12032       12032

. reshape long age, i( Fips Year Race Sex) string

because of a problem documented at
http://www.stata.com/support/faqs/data/reshape3.html
namely

"On occasion, people use numeric suffixes with leading zeros,
such as 01, 02, and so forth. -reshape- will understand these
properly only if they are declared as string."

Anyway, the result is

. l

           Fips        Year        Race         Sex         _j
age
  1.          1           1           1           1         04
101151
  2.          1           1           1           1         59
102545
  3.          1           1           1           1       8084
12032
  4.          1           1           1           1       8500
12032

The problem is just one of names, and as you say -age- is
really frequency and -- also -- _j is really -age-.

To get to where you want to be, it is now an -expand- problem.
-reshape- worked as advertised, and had no way of knowing
that you also wanted to -expand-.

. expand age
. drop age
. rename _j age
< clean up age>

except that wait a moment! Why do you need e.g. 102,545
observations which are all the same? Only if you need to
run a command which does not accept weights, I suggest.

Nick
n.j.cox@durham.ac.uk

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index