Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: how to create average income of same cohorts?


From   Mirko <mirko.moro@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: how to create average income of same cohorts?
Date   Tue, 15 May 2007 23:00:16 +0100

William,
thanks a lot for all the work you have done on this.
Yes, I've actually made a stupid mistakes in the datasets I reported
as examples.
(in practice the first matrix of data has different age values than
the second one...pity on me!...
So the income averages yref I use in the example are based on the
first age values, not on the second).
Thank you also because you understood perfectly what I meant even if
there was that example
So I really think your do file is gonna work. I am gonna try.
Cheers,
Mirko

On 5/15/07, William Gould, StataCorp LP <wgould@stata.com> wrote:
Mirko <mirko.moro@gmail.com> writes,

> I have prepared the following example.
> Given this data:
>
>        id  region     educ       age   income
>        1        2        1        25        5
>        1        2        1        26        5
>        2        2        1        29        8
>        2        2        1        30        8
>        3        2        1        32        11
>        3        2        1        33        11
>        4        1        1        40        5
>        4        1        1        41        5
>        5        1        2        37        8
>        5        1        2        38        8
>        6        1        2        42        9
>        6        1        2        43        9
>
> I need to create the variable yref, that is the average income of a
> reference group defined by the same level of education, the same
> region and the same cohort (i.e., people who are 5 years younger and 5
> years older).

I think I have a solution.  Mirko gave answers for this small dataset
and, in two places, my solution does not give the same results as Mirko
does, but I'm hoping Mirko did the pencil-and-paper calculation wrong.
If I made a mistake, finding it should not be too difficult.

My approach is to create ref.dta containing region, educ, age, and yref.
I said to myself, if I had that dataset, then getting what Mirko wants
would be easy:

        . use master                 // Mirko's original dataset
        . sort region educ age
        . merge region educ age using ref, nokeep

So now let's think about creating ref.dta.

Every observation in Mirko's original dataset appears in 11 cohort
groups, age-5, age-4, ..., age, age+1, age+2, ..., age+5.
So let's start with the original dataset and make a dataset eleven times
bigger (each original observation appears in 11 different cohorts).  Then we
can collapse the big dataset into averages.

Here is my solution to the make-the-giant-dataset problem:

        . use master, clear
        . rename age cohort
        . drop id
        . drop in f/l
        . save sofar, emptyok replace

        . forvalues i= -5(1)5 {
        .         use master, clear
        .         drop id
        .         replace age = age + `i'
        .         rename age cohort
        .         append using sofar
        .         save sofar, replace
        . }

I'm sure sofar.dta could have been made more efficiently.
I suspect someone is going to point out that we could -expand- the data,
then -sort-, and then do some very clever -replace-.  Above is what occurred
to me, however.

Now that I have sofar.dta, I can make ref.dta from it:

        . sort region educ cohort
        . by region educ cohort: gen yref = sum(income)/_n
        . by region educ cohort: keep if _n==_N
        . drop income
        . rename cohort age
        . sort region educ age
        . save ref, replace

and now I'm read to do the last step:

        . use master
        . sort region educ age
        . merge region educ age using ref, nokeep

I have a do-file below my signature.  Here's the output it produced:

        . list

             +----------------------------------------------+
             | id   region   educ   age   income       yref |
             |----------------------------------------------|
          1. |  1        2      1    25        5        6.5 |
          2. |  1        2      1    26        5        6.5 |
          3. |  2        2      1    29        8          8 |
          4. |  2        2      1    30        8          8 |
          5. |  3        2      1    32       11        9.5 |
             |----------------------------------------------|
          6. |  3        2      1    33       11        9.5 |
          7. |  4        1      1    40        5          5 |
          8. |  4        1      1    41        5          5 |
          9. |  5        1      2    37        8   8.333333 |   <--
         10. |  5        1      2    38        8        8.5 |
             |----------------------------------------------|
         11. |  6        1      2    42        9        8.5 |
         12. |  6        1      2    43        9   8.666667 |   <--
             +----------------------------------------------+

My program differs from Mirkos expectation in two observations.

-- Bill
wgould@stata.com

------------------------------------------------------------------------------
clear

input id region educ age income
1       2       1       25      5
1       2       1       26      5
2       2       1       29      8
2       2       1       30      8
3       2       1       32      11
3       2       1       33      11
4       1       1       40      5
4       1       1       41      5
5       1       2       37      8
5       1       2       38      8
6       1       2       42      9
6       1       2       43      9
end
sort region educ age
save master, replace

use master, clear
rename age cohort
drop id
drop in f/l
describe
save sofar, emptyok replace

forvalues i= -5(1)5 {
        use master, clear
        drop id
        replace age = age + `i'
        rename age cohort
        append using sofar
        save sofar, replace
}
sort region educ cohort
by region educ cohort: gen yref = sum(income)/_n
by region educ cohort: keep if _n==_N
drop income
list
rename cohort age
sort region educ age
save ref, replace

use master, clear
merge region educ age using ref, nokeep
sort id educ age
list
------------------------------------------------------------------------------

<end>
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index