Rituparna Basu

statalist@hsphsun2.harvard.edu

RE: st: Labeling different kinds of missing observations

Fri, 20 Apr 2012 16:28:43 +0000

Thank you Nick once again, and I will try out the code and inform you. The reason I am not using long form is the following: My original data is in long form and looks like: ID Year Presence 1 06 x 1 07 x 1 08 x 2 06 x 2 08 x 3 06 x 3 07 x 3 08 x 3 09 x 4 09 x 4 10 x 4 11 x Meaning that it is an unbalanced panel data from the year 2006 to 2011. And if I reshape it this is how it will look (sort of): ID Y1 Y2 Y3 Y4 Y5 Y6 Y7 1 . . x x x x x 2 . x . x x x x 3 . . x x . x x 4 . x . x x . x 5 x x . x x x . The IMPORTANT thing here is that the missing here means something: either they did not begin the study or came back after a gap of 1-2 or more years. Hope it makes sense and answers your concern. So, having said this, do you think it is possible to the similar iteration using LONG form? Thanks again! Regards, RB -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox Sent: Friday, April 20, 2012 1:09 AM To: statalist@hsphsun2.harvard.edu Subject: Re: st: Labeling different kinds of missing observations You are correct. When you asked for code to generate a variable, I did not understand that you want to replace a variable. Also, there is an error in the code I posted (_not_ a macro in Stata terms). References to -`first'- were spurious: should have been just -first-. Sorry about that. But the -first- variable it creates is still relevant. clear input ID Y1 Y2 Y3 Y4 Y5 Y6 Y7 1 . . 2 3 4 5 6 2 . 7 . 8 9 10 11 3 . . 12 13 . 14 15 4 . 16 . 17 18 . 19 5 20 21 . 22 23 24 . end gen missbefore = 0 gen missafter = 0 gen first = . qui forval J = 1/7 { replace missbefore = 1 if missing(Y`J') & `J' < first replace first = `J' if missing(first) & !missing(Y`J') replace missafter = 1 if missing(Y`J') & `J' > first } drop miss* qui forval J = 1/7 { replace Y`J' = cond(`J' < first, 0, 1) if missing(Y`J') } list That said, this sounds like a bad idea. 1. If 1 and 0 are in principle possible non-missing values it is a very bad idea. 2. Even if not, you need to remember to exclude the 0s and 1s from many, if not most, calculations with these variables. 3. Extended missing values (.a, .b, etc.) sound like what you really need here. My question "Why not -reshape long-?" still stands. Nick On Fri, Apr 20, 2012 at 7:52 AM, Rituparna Basu <basur@pamfri.org> wrote: > Hi Nick, > > Thank you so much for the resources and the code. > I did run the macro but it said 'invalid syntax'. > > I think I did not mention my question properly. I would like to transform the following data : > ID Y1 Y2 Y3 Y4 Y5 Y6 Y7 > 1 . . x x x x x > 2 . x . x x x x > 3 . . x x . x x > 4 . x . x x . x > 5 x x . x x x . > > Transform to: > > ID Y1 Y2 Y3 Y4 Y5 Y6 Y7 > 1 0 0 x x x x x > 2 0 x 1 x x x x > 3 0 0 x x 1. x x > 4 0 x 1 x x 1 x > 5 x x 1 x x x 1 > > Basically, replace the missing of var Y* (missing obs before and after the first obs (as you can see)) and not create a new variable. > I apologize for the confusion but any help is greatly appreciated! > > Thank you! > > Regards, > RB > > -----Original Message----- > From: owner-statalist@hsphsun2.harvard.edu > [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox > Sent: Thursday, April 19, 2012 12:15 PM > To: statalist@hsphsun2.harvard.edu > Subject: Re: st: Labeling different kinds of missing observations > > This sounds as if you want indicators > > missbefore 1 if any missing before first non-missing and 0 otherwise > > missafter 1 if any missing after etc. > > Here's a sketch. Code not tested. > > gen missbefore = 0 > gen missafter = 0 > gen first = . > > qui forval J = 1/7 { > replace missbefore = 1 if missing(Y`J') & `J' < `first' > replace first = `J' if missing(first) & !missing(Y`J') > replace missafter = 1 if missing(Y`J') & `J' > `first' > } > > I think of these problems in this way. > > 1. I need to initialise an indicator. Sometimes the initial value does not matter; sometimes it does. You have to think it through for each problem. > > 2. I need to loop over the variables. > > 3. The first key then is "when do I change my mind?" > > 4. The second key is "if I change my mind, is the indicator then fixed, or may I need to update it?" > > But why not -reshape long-? > > See also > > SJ-9-1 pr0046 . . . . . . . . . . . . . . . . . . . Speaking Stata: > Rowwise > (help rowsort, rowranks if installed) . . . . . . . . . . . N. > J. I am trying to generate a variable that will indicate missing BEFORE FIRST YEAR of OBSERVATION and missing AFTER FIRST YEAR of OBSERVATION.
Here is a sample of the data:

ID Y1 Y2 Y3 Y4 Y5 Y6 Y7
1 . . x x x x x
2 . x . x x x x
3 . . x x . x x
4 . x . x x . x
5 x x . x x x .

