Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Labeling different kinds of missing observations


From   Nick Cox <[email protected]>
To   [email protected]
Subject   Re: st: Labeling different kinds of missing observations
Date   Fri, 20 Apr 2012 18:16:31 +0100

Yes, yes, yes. One of the nicest things about this list is being able
to suggest something really simple that should solve your problem.
First, if you -reshape long- you can make your missings explicit by

. fillin ID year

See -help fillin-. Also.

SJ-5-1  dm0011  . . . . . . . . . . . . . .  Stata tip 17: Filling in the gaps
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
        Q1/05   SJ 5(1):135--136                                 (no commands)
        tips for using fillin to fill in gaps in a rectangular
        data structure

Second, in long form indicators are also easy, e.g. for a generic
-panelid-, time variable -time- and response variable -y-  you can get
the time of the first non-missing value and then associated indicators

bysort panelid : egen first = min(time / !missing(y))

gen missbefore = time < first

gen missafter = missing(y) & (time > first)

What's typical is that

loops to do something across observations (rows)

become

one- or two- liners to do something in panels.

As before, I am not happy about replacing missings with indicators.


On Fri, Apr 20, 2012 at 5:28 PM, Rituparna Basu <[email protected]> wrote:
> Thank you Nick once again, and I will try out the code and inform you.
>
> The reason I am not using long form is the following:
>
> My original data is in long form and looks like:
>
> ID Year  Presence
> 1   06         x
> 1   07         x
> 1   08         x
> 2   06        x
> 2    08        x
> 3   06        x
> 3    07       x
> 3    08        x
> 3    09       x
> 4    09       x
> 4    10       x
> 4    11       x
>
> Meaning that it is an unbalanced panel data from the year 2006 to 2011.
>
> And if I reshape it this is how it will look (sort of):
>
> ID  Y1  Y2  Y3  Y4  Y5  Y6  Y7
>  1    .      .      x    x     x     x   x
>  2    .     x      .    x     x     x   x
>  3    .      .      x    x    .     x   x
>  4    .      x     .    x     x     .   x
>  5    x     x     .   x      x     x  .
>
> The IMPORTANT thing here is that the missing here means something: either they did not begin the study or came back after a gap of 1-2 or more years. Hope it makes sense and answers your concern. So, having said this, do you think it is possible to the similar iteration using LONG form?
>
> Thanks again!
>
> Regards,
>
> RB
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of Nick Cox
> Sent: Friday, April 20, 2012 1:09 AM
> To: [email protected]
> Subject: Re: st: Labeling different kinds of missing observations
>
> You are correct. When you asked for code to generate a variable, I did not understand that you want to replace a variable.
>
> Also, there is an error in the code I posted (_not_ a macro in Stata terms). References to -`first'- were spurious: should have been just -first-. Sorry about that.
>
> But the -first- variable it creates is still relevant.
>
> clear
>
> input  ID  Y1  Y2  Y3  Y4  Y5  Y6  Y7
>  1    .      .      2   3    4    5  6
>  2    .     7     .    8     9     10 11
>  3    .      .      12   13  .     14  15
>  4    .      16    .    17    18    .   19
>  5    20    21    .   22     23    24 .
> end
>
> gen missbefore = 0
> gen missafter = 0
> gen first = .
>
> qui forval J = 1/7 {
>      replace missbefore = 1 if missing(Y`J') & `J' < first
>      replace first = `J' if missing(first) & !missing(Y`J')
>      replace missafter = 1 if missing(Y`J') & `J' > first }
>
> drop miss*
>
> qui forval J = 1/7 {
>        replace Y`J' = cond(`J' < first, 0, 1) if missing(Y`J') }
>
> list
>
> That said, this sounds like a bad idea.
>
> 1. If 1 and 0 are in principle possible non-missing values it is a very bad idea.
>
> 2. Even if not, you need to remember to exclude the 0s and 1s from many, if not most, calculations with these variables.
>
> 3. Extended missing values (.a, .b, etc.) sound like what you really need here.
>
> My question "Why not -reshape long-?" still stands.
>
> Nick
>
> On Fri, Apr 20, 2012 at 7:52 AM, Rituparna Basu <[email protected]> wrote:
>> Hi Nick,
>>
>> Thank you so much for the resources and the code.
>> I did run the macro but it said 'invalid syntax'.
>>
>> I think I did not mention my question properly. I would like to transform the following data :
>>  ID  Y1  Y2  Y3  Y4  Y5  Y6  Y7
>>  1    .      .      x    x     x     x   x
>>  2    .     x      .    x     x     x   x
>>  3    .      .      x    x    .     x   x
>>  4    .      x     .    x     x     .   x
>>  5    x     x     .   x      x     x  .
>>
>> Transform to:
>>
>> ID  Y1  Y2  Y3  Y4  Y5  Y6  Y7
>>  1   0      0      x    x     x     x   x
>>  2    0     x      1    x     x     x   x
>>  3    0      0      x    x   1.     x   x
>>  4    0      x     1    x     x     1   x
>>  5    x     x     1   x      x     x  1
>>
>> Basically, replace the missing of var Y* (missing obs before and after the first obs (as you can see)) and not create a new variable.
>> I apologize  for the confusion but any help is greatly appreciated!
>>
>> Thank you!
>>
>> Regards,
>> RB
>>
>> -----Original Message-----
>> From: [email protected]
>> [mailto:[email protected]] On Behalf Of Nick Cox
>> Sent: Thursday, April 19, 2012 12:15 PM
>> To: [email protected]
>> Subject: Re: st: Labeling different kinds of missing observations
>>
>> This sounds as if you want indicators
>>
>> missbefore  1 if any missing before first non-missing and 0 otherwise
>>
>> missafter     1 if any missing after etc.
>>
>> Here's a sketch. Code not tested.
>>
>> gen missbefore = 0
>> gen missafter = 0
>> gen first = .
>>
>> qui forval J = 1/7 {
>>       replace missbefore = 1 if missing(Y`J') & `J' < `first'
>>       replace first = `J' if missing(first) & !missing(Y`J')
>>       replace missafter = 1 if missing(Y`J') & `J' > `first'
>> }
>>
>> I think of these problems in this way.
>>
>> 1. I need to initialise an indicator. Sometimes the initial value does not matter; sometimes it does. You have to think it through for each problem.
>>
>> 2. I need to loop over the variables.
>>
>> 3. The first key then is "when do I change my mind?"
>>
>> 4. The second key is "if I change my mind, is the indicator then fixed, or may I need to update it?"
>>
>> But why not -reshape long-?
>>
>> See also
>>
>> SJ-9-1  pr0046  . . . . . . . . . . . . . . . . . . .  Speaking Stata:
>> Rowwise
>>        (help rowsort, rowranks if installed) . . . . . . . . . . .  N.
>> J. Cox
>>        Q1/09   SJ 9(1):137--157
>>        shows how to exploit functions, egen functions, and Mata
>>        for working rowwise; rowsort and rowranks are introduced
>>
>> Nick
>>
>> On Thu, Apr 19, 2012 at 7:10 PM, Rituparna Basu <[email protected]> wrote:
>>
>>> I am trying to generate a variable that will indicate missing BEFORE FIRST YEAR of OBSERVATION and missing AFTER FIRST YEAR of OBSERVATION.
>>> Here is a sample of the data:
>>>
>>> ID  Y1  Y2  Y3  Y4  Y5  Y6  Y7
>>> 1    .      .      x    x     x     x   x
>>> 2    .     x      .    x     x     x   x
>>> 3    .      .      x    x    .     x   x
>>> 4    .      x     .    x     x     .   x
>>> 5    x     x     .   x      x     x  .
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index