Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: creating differences when time periods are misssing


From   "Scott Cunningham" <scunning@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: creating differences when time periods are misssing
Date   Thu, 1 Nov 2007 07:07:27 -0500

Thank you Nick.  I knew of this FAQ, but wasn't sure how to difference
within some identifier to fill the missing cells.  This looks like
what I need.

On 11/1/07, n j cox <n.j.cox@durham.ac.uk> wrote:
> This problem is a twist away from one discussed in an FAQ:
>
> How can I replace missing values with previous or following nonmissing
> values or within sequences?
> http://www.stata.com/support/faqs/data/missing.html
>
> Scott only needs to fill in gaps of missings with the previous value,
> then all is plain sailing.
>
> gen amo2 = amo
> bysort id (year) : replace amo2 = amo2[_n-1] if missing(amo)
>
> Then
>
> by id : gen dt2 = cond(amo == ., ., d.amo2)
>
> I would make no claims about efficiency except that this should beat
>
> 1. any loop
> 2. fixing by hand
>
> This should also fix gaps longer than one year.
>
> Nick
> n.j.cox@durham.ac.uk
>
> Scott Cunningham
> --------------------------------------------------------------------------------
>
> My data is a longitudinal dataset of individuals who were interviewed
> from 1997 to 2004.  I have data on individual ages (measured as months
> from birth month).  Because this interview did not always,
> consistently, ask individuals exactly 12 months after the last
> interview, I have been trying to control for differences in time since
> the last interview by differencing their ages as so:
>
> . gen dt=d.amo
>
> where "amo" is "age in months."  I notice that this works so long as I
> have values of amo in both the current and previous year.  But there
> are some people who disappear from the survey only to return a year
> later.  They look like this:
>
>          +----------------------------------+
>          |  id   rp   age   amo   dt   year |
>          |----------------------------------|
>     56. |  27    0    15   189   12   1997 |
>     57. |  27    .     .     .    .   1998 |
>     58. |  27    4    18   226    .   1999 |
>     59. |  27    3    19   237   11   2000 |
>     60. |  27    8    20   247   10   2001 |
>     61. |  27    6    21   259   12   2002 |
>     62. |  27    4    22   273   14   2003 |
>         |----------------------------------|
>     63. |  27    1    23   283   10   2004 |
>
> The relevant variables are:  id (indiciating this is the same person),
> amo (age in months on day of interview), dt (time since last
> interview), and year.  Ignore the "rp" variable, but note that this
> variable measures something which depends on "dt" since it is a
> measure of something done since the date of the last interview.
>
> So, the problem is "dt" is missing twice.  Once when all values are
> missing because the person was not interviewed.  A second time when he
> comes back in.  Ideally, I would like to know how to create
> differenced values for dt equal to (226-189), since the respondent is
> 226 months old on the day of the interview and was 189 the last time
> interviewed.  What's the most efficient code to do this?
>
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>


-- 
"A man must be orthodox on most things, or he will never have time
able to practice his own particular heresy." - GK Chesterton
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index