Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: take the age of one observation and attach it to its matching observation by id with events accruing over time intervals


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: take the age of one observation and attach it to its matching observation by id with events accruing over time intervals
Date   Sun, 18 Oct 2009 17:44:18 +0100

Two contributions by Jeph Herrin and Scott Merryman to this thread
centred on using a -merge- to merge a subset of the data with the data
as a whole. As Eric did not point out that he was using Stata 10.1, the
solutions both specified a syntax of -merge- for 11, but he evidently
managed to translate back to the old syntax. 

That's a good solution, but it left up in the air how Eric could have
solved his problem using the methods in the sources he quoted. 

Here's a solution in that style: 

gen goal = . 
qui forval i = 1/`=_N' { 
	su age if id == spouse_id[`i'] & year == year[`i'], meanonly 
	if r(min) == r(max) replace goal = r(min) in `i' 
	else replace goal = -999 in `i' 
}

This code does the look-up that is wanted. That is, you want 

(a) the age of the person who is the spouse of the person in this
observation 

age if id == spouse_id[`i'] 

but (b) we must look it up for the same year, as spouses come and go

& year == year[`i'] 

There is some built-in paranoia to do some checking. -summarize,
meanonly- will, it is evidently hoped, find one observation and thus one
valid value for each spouse in each year. In that case, the minimum and
maximum returned will be identical. If not, then there is some problem
with the data and the code puts -999 in each such observation. (Thus
check that there are no -999 values afterwards.) 

A loop over observations is often disparaged, by myself too, as
inefficient, but in problems like this it is close to a relatively
transparent way of thinking about the problem. 

No-one addressed the last part of Eric's post, but marriage durations
yield easily to either -tsspell- from SSC or the techniques discussed in


SJ-7-2  dm0029  . . . . . . . . . . . . . . Speaking Stata: Identifying
spells
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N.
J. Cox
        Q2/07   SJ 7(2):249--265                                 (no
commands)
        shows how to handle spells with complete control over
        spell specification 

Nick 
n.j.cox@durham.ac.uk 

Eric Fail

After hours of reading and fiddling around I allow myself to write on
the Statalist in the
hope that someone out there will take the time to help me.

What I want to do seem quite simple but I just can't figure it out.

I simply want to take the age of one spouse and attach it to its
matching spouse by id, so I
get a spouse_age at each observation.

I have read this thread
http://www.stata.com/statalist/archive/2002-07/msg00124.html where the
case is family's, but as I don't have continues id's that tips doesn't
seem to work in my
case. Furthermore I have read Nicholas J. Cox' Stata tip 51: Events in
intervals. But I must
admit that I couldn't figure it out from that description either. So now
I try my luck here at
the Statalist.

I have a dataset like the one below, except the 'goal_spouse_age',
that's the variable I want
to create.


clear
input str17   date            year     id   spouse_id age
goal_spouse_age
             "01/01/2000"     2000      1         4    40    19
             "01/01/2001"     2001      1         4    41    20
             "01/01/2002"     2002      1         5    42    40
             "01/01/2000"     2000      2         6    24    24
             "01/01/2001"     2001      2         7    25    40
             "01/01/2000"     2000      3         8    20    16
             "01/01/2001"     2001      3         8    21    17
             "01/01/2002"     2002      3        11    22    44
             "01/01/2003"     2003      3         4    23    22
             "01/01/2000"     2000      4         1    19    40
             "01/01/2001"     2001      4         1    20    41
             "01/01/2002"     2002      5         1    40    42
             "01/01/2000"     2000      6         2    24    24
             "01/01/2001"     2001      7         2    40    25
             "01/01/2000"     2000      8         3    16    20
             "01/01/2001"     2001      8         3    17    21
             "01/01/2002"     2002     11         3    44    22
             "01/01/2003"     2003      4         3    22    23
End

Can anyone tell me what to do or what I should read to figure this out?

I have managed to count the numbers of spouse each observation has, by
using the loop below,
thanks to Nicholas J. Cox' description the thread mention above.

local N = _N
forvalues i = 1/`N' {
	egen tag = tag(spouse_id) ///
	if id == id[`i']
	count if tag
	replace count = r(N) in `i'
	drop tag
}

The next thing I need to do is to measure the length of the first two or
three marriages the
observations have had. I mention this even I haven't work very much on
this part in the hope
that one of you guys out there have had s similar case or can direct me
to somewhere where I
can read more about the specific case with events accruing over a time
span.


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index