Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: how to create average income of same cohorts?


From   wgould@stata.com (William Gould, StataCorp LP)
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: how to create average income of same cohorts?
Date   Tue, 15 May 2007 16:27:30 -0500

Mirko <mirko.moro@gmail.com> writes, 

> I have prepared the following example.
> Given this data:
> 
>        id  region     educ       age   income
>        1        2        1        25        5
>        1        2        1        26        5
>        2        2        1        29        8
>        2        2        1        30        8
>        3        2        1        32        11
>        3        2        1        33        11
>        4        1        1        40        5
>        4        1        1        41        5
>        5        1        2        37        8
>        5        1        2        38        8
>        6        1        2        42        9
>        6        1        2        43        9
> 
> I need to create the variable yref, that is the average income of a
> reference group defined by the same level of education, the same
> region and the same cohort (i.e., people who are 5 years younger and 5
> years older).

I think I have a solution.  Mirko gave answers for this small dataset
and, in two places, my solution does not give the same results as Mirko 
does, but I'm hoping Mirko did the pencil-and-paper calculation wrong.
If I made a mistake, finding it should not be too difficult.

My approach is to create ref.dta containing region, educ, age, and yref.
I said to myself, if I had that dataset, then getting what Mirko wants 
would be easy:

        . use master                 // Mirko's original dataset 
        . sort region educ age
        . merge region educ age using ref, nokeep

So now let's think about creating ref.dta.  

Every observation in Mirko's original dataset appears in 11 cohort 
groups, age-5, age-4, ..., age, age+1, age+2, ..., age+5.  
So let's start with the original dataset and make a dataset eleven times 
bigger (each original observation appears in 11 different cohorts).  Then we
can collapse the big dataset into averages.

Here is my solution to the make-the-giant-dataset problem:

        . use master, clear 
        . rename age cohort 
        . drop id
        . drop in f/l
        . save sofar, emptyok replace

        . forvalues i= -5(1)5 { 
        .         use master, clear 
        .         drop id
        .         replace age = age + `i'
        .         rename age cohort 
        .         append using sofar
        .         save sofar, replace
        . }

I'm sure sofar.dta could have been made more efficiently.
I suspect someone is going to point out that we could -expand- the data, 
then -sort-, and then do some very clever -replace-.  Above is what occurred
to me, however.

Now that I have sofar.dta, I can make ref.dta from it:

        . sort region educ cohort 
        . by region educ cohort: gen yref = sum(income)/_n
        . by region educ cohort: keep if _n==_N
        . drop income 
        . rename cohort age
        . sort region educ age
        . save ref, replace

and now I'm read to do the last step:

        . use master
        . sort region educ age
        . merge region educ age using ref, nokeep

I have a do-file below my signature.  Here's the output it produced:

        . list 

             +----------------------------------------------+
             | id   region   educ   age   income       yref |
             |----------------------------------------------|
          1. |  1        2      1    25        5        6.5 |
          2. |  1        2      1    26        5        6.5 |
          3. |  2        2      1    29        8          8 |
          4. |  2        2      1    30        8          8 |
          5. |  3        2      1    32       11        9.5 |
             |----------------------------------------------|
          6. |  3        2      1    33       11        9.5 |
          7. |  4        1      1    40        5          5 |
          8. |  4        1      1    41        5          5 |
          9. |  5        1      2    37        8   8.333333 |   <--
         10. |  5        1      2    38        8        8.5 |
             |----------------------------------------------|
         11. |  6        1      2    42        9        8.5 |
         12. |  6        1      2    43        9   8.666667 |   <--
             +----------------------------------------------+

My program differs from Mirkos expectation in two observations.

-- Bill 
wgould@stata.com

------------------------------------------------------------------------------
clear

input id region educ age income
1       2       1       25      5
1       2       1       26      5
2       2       1       29      8
2       2       1       30      8
3       2       1       32      11
3       2       1       33      11
4       1       1       40      5
4       1       1       41      5
5       1       2       37      8
5       1       2       38      8
6       1       2       42      9
6       1       2       43      9
end
sort region educ age
save master, replace

use master, clear 
rename age cohort 
drop id
drop in f/l
describe
save sofar, emptyok replace

forvalues i= -5(1)5 { 
        use master, clear 
        drop id
        replace age = age + `i'
        rename age cohort 
        append using sofar
        save sofar, replace
}
sort region educ cohort 
by region educ cohort: gen yref = sum(income)/_n
by region educ cohort: keep if _n==_N
drop income 
list
rename cohort age
sort region educ age
save ref, replace

use master, clear 
merge region educ age using ref, nokeep
sort id educ age 
list
------------------------------------------------------------------------------

<end>
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index