Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# RE: st: Labeling different kinds of missing observations

 From Rituparna Basu To "statalist@hsphsun2.harvard.edu" Subject RE: st: Labeling different kinds of missing observations Date Fri, 20 Apr 2012 19:07:20 +0000

Thank you Nick a lot!! I was not aware that 'Fillin ' can do wonders! Both the commands worked!!
Yes, you are correct about filling the missing with indicators but sometimes it is the research question that drives  you to fill in the missing.

Regards,

RB

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox
Sent: Friday, April 20, 2012 10:17 AM
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: Labeling different kinds of missing observations

Yes, yes, yes. One of the nicest things about this list is being able to suggest something really simple that should solve your problem.
First, if you -reshape long- you can make your missings explicit by

. fillin ID year

See -help fillin-. Also.

SJ-5-1  dm0011  . . . . . . . . . . . . . .  Stata tip 17: Filling in the gaps
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
Q1/05   SJ 5(1):135--136                                 (no commands)
tips for using fillin to fill in gaps in a rectangular
data structure

Second, in long form indicators are also easy, e.g. for a generic -panelid-, time variable -time- and response variable -y-  you can get the time of the first non-missing value and then associated indicators

bysort panelid : egen first = min(time / !missing(y))

gen missbefore = time < first

gen missafter = missing(y) & (time > first)

What's typical is that

loops to do something across observations (rows)

become

one- or two- liners to do something in panels.

As before, I am not happy about replacing missings with indicators.

On Fri, Apr 20, 2012 at 5:28 PM, Rituparna Basu <basur@pamfri.org> wrote:
> Thank you Nick once again, and I will try out the code and inform you.
>
> The reason I am not using long form is the following:
>
> My original data is in long form and looks like:
>
> ID Year  Presence
> 1   06         x
> 1   07         x
> 1   08         x
> 2   06        x
> 2    08        x
> 3   06        x
> 3    07       x
> 3    08        x
> 3    09       x
> 4    09       x
> 4    10       x
> 4    11       x
>
> Meaning that it is an unbalanced panel data from the year 2006 to 2011.
>
> And if I reshape it this is how it will look (sort of):
>
> ID  Y1  Y2  Y3  Y4  Y5  Y6  Y7
>  1    .      .      x    x     x     x   x
>  2    .     x      .    x     x     x   x
>  3    .      .      x    x    .     x   x
>  4    .      x     .    x     x     .   x
>  5    x     x     .   x      x     x  .
>
> The IMPORTANT thing here is that the missing here means something: either they did not begin the study or came back after a gap of 1-2 or more years. Hope it makes sense and answers your concern. So, having said this, do you think it is possible to the similar iteration using LONG form?
>
> Thanks again!
>
> Regards,
>
> RB
>
> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu
> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox
> Sent: Friday, April 20, 2012 1:09 AM
> To: statalist@hsphsun2.harvard.edu
> Subject: Re: st: Labeling different kinds of missing observations
>
> You are correct. When you asked for code to generate a variable, I did not understand that you want to replace a variable.
>
> Also, there is an error in the code I posted (_not_ a macro in Stata terms). References to -`first'- were spurious: should have been just -first-. Sorry about that.
>
> But the -first- variable it creates is still relevant.
>
> clear
>
> input  ID  Y1  Y2  Y3  Y4  Y5  Y6  Y7
>  1    .      .      2   3    4    5  6
>  2    .     7     .    8     9     10 11
>  3    .      .      12   13  .     14  15
>  4    .      16    .    17    18    .   19
>  5    20    21    .   22     23    24 .
> end
>
> gen missbefore = 0
> gen missafter = 0
> gen first = .
>
> qui forval J = 1/7 {
>      replace missbefore = 1 if missing(Y`J') & `J' < first
>      replace first = `J' if missing(first) & !missing(Y`J')
>      replace missafter = 1 if missing(Y`J') & `J' > first }
>
> drop miss*
>
> qui forval J = 1/7 {
>        replace Y`J' = cond(`J' < first, 0, 1) if missing(Y`J') }
>
> list
>
> That said, this sounds like a bad idea.
>
> 1. If 1 and 0 are in principle possible non-missing values it is a very bad idea.
>
> 2. Even if not, you need to remember to exclude the 0s and 1s from many, if not most, calculations with these variables.
>
> 3. Extended missing values (.a, .b, etc.) sound like what you really need here.
>
> My question "Why not -reshape long-?" still stands.
>
> Nick
>
> On Fri, Apr 20, 2012 at 7:52 AM, Rituparna Basu <basur@pamfri.org> wrote:
>> Hi Nick,
>>
>> Thank you so much for the resources and the code.
>> I did run the macro but it said 'invalid syntax'.
>>
>> I think I did not mention my question properly. I would like to transform the following data :
>>  ID  Y1  Y2  Y3  Y4  Y5  Y6  Y7
>>  1    .      .      x    x     x     x   x
>>  2    .     x      .    x     x     x   x
>>  3    .      .      x    x    .     x   x
>>  4    .      x     .    x     x     .   x
>>  5    x     x     .   x      x     x  .
>>
>> Transform to:
>>
>> ID  Y1  Y2  Y3  Y4  Y5  Y6  Y7
>>  1   0      0      x    x     x     x   x
>>  2    0     x      1    x     x     x   x
>>  3    0      0      x    x   1.     x   x
>>  4    0      x     1    x     x     1   x
>>  5    x     x     1   x      x     x  1
>>
>> Basically, replace the missing of var Y* (missing obs before and after the first obs (as you can see)) and not create a new variable.
>> I apologize  for the confusion but any help is greatly appreciated!
>>
>> Thank you!
>>
>> Regards,
>> RB
>>
>> -----Original Message-----
>> From: owner-statalist@hsphsun2.harvard.edu
>> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox
>> Sent: Thursday, April 19, 2012 12:15 PM
>> To: statalist@hsphsun2.harvard.edu
>> Subject: Re: st: Labeling different kinds of missing observations
>>
>> This sounds as if you want indicators
>>
>> missbefore  1 if any missing before first non-missing and 0 otherwise
>>
>> missafter     1 if any missing after etc.
>>
>> Here's a sketch. Code not tested.
>>
>> gen missbefore = 0
>> gen missafter = 0
>> gen first = .
>>
>> qui forval J = 1/7 {
>>       replace missbefore = 1 if missing(Y`J') & `J' < `first'
>>       replace first = `J' if missing(first) & !missing(Y`J')
>>       replace missafter = 1 if missing(Y`J') & `J' > `first'
>> }
>>
>> I think of these problems in this way.
>>
>> 1. I need to initialise an indicator. Sometimes the initial value does not matter; sometimes it does. You have to think it through for each problem.
>>
>> 2. I need to loop over the variables.
>>
>> 3. The first key then is "when do I change my mind?"
>>
>> 4. The second key is "if I change my mind, is the indicator then fixed, or may I need to update it?"
>>
>> But why not -reshape long-?
>>
>>
>> SJ-9-1  pr0046  . . . . . . . . . . . . . . . . . . .  Speaking Stata:
>> Rowwise
>>        (help rowsort, rowranks if installed) . . . . . . . . . . .  N.
>> J. Cox
>>        Q1/09   SJ 9(1):137--157
>>        shows how to exploit functions, egen functions, and Mata
>>        for working rowwise; rowsort and rowranks are introduced
>>
>> Nick
>>
>> On Thu, Apr 19, 2012 at 7:10 PM, Rituparna Basu <basur@pamfri.org> wrote:
>>
>>> I am trying to generate a variable that will indicate missing BEFORE FIRST YEAR of OBSERVATION and missing AFTER FIRST YEAR of OBSERVATION.
>>> Here is a sample of the data:
>>>
>>> ID  Y1  Y2  Y3  Y4  Y5  Y6  Y7
>>> 1    .      .      x    x     x     x   x
>>> 2    .     x      .    x     x     x   x
>>> 3    .      .      x    x    .     x   x
>>> 4    .      x     .    x     x     .   x
>>> 5    x     x     .   x      x     x  .
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/