Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Lisa Wang <lhwang0925@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: RE: drop variables in panel data with loop |
Date | Tue, 24 Jul 2012 23:02:46 +1000 |
No problems, I understand. Thank you for your suggestions thus far anyways. I appreciate it. Can anyone else suggest code to how I should approach this? Thanks, Lisa On Tue, Jul 24, 2012 at 5:45 PM, Nick Cox <njcoxstata@gmail.com> wrote: > > I'm travelling and can't write long replies. If you don't get further help, > contact Stata tech support. > > Begin forwarded message: > > From: Lisa Wang <lhwang0925@gmail.com> > Date: 23 July 2012 23:26:11 GMT+01:00 > To: statalist@hsphsun2.harvard.edu > Subject: Re: st: RE: drop variables in panel data with loop > Reply-To: statalist@hsphsun2.harvard.edu > > Thank you for your example. That made is so much clearer! I should > have done something similar at the start. Now, I know how to word my > question better next time. Thank you. > > I now understand why the code you kindly suggested to me may have > dropped all of my observations. > > For each i, I will definitely have missing r's somewhere in the panel > for each i, so Stata recognises this and drops everything for me. > > Using your example below (with a modified observation 9 to be missing > as well), then both i==2 and 3 would be dropped. Let's say, however, I > only want Stata to drop the the panel only if there is/are missing r's > between t =-3 to 4 (i.e. i==2 would all be dropped but i==3 would > remain in my dataset). I don't want i==3 to be dropped though as that > won't cause a problem to my further analysis. > > +------------+ > > | i t r | > > |------------| > > 1. | 1 1 42 | > > 2. | 1 2 42 | > > 3. | 1 3 42 | > > 4. | 1 4 42 | > > |------------| > > 5. | 2 1 42 | > > 6. | 2 2 42 | > > 7. | 2 3 . | > > 8. | 2 4 42 | > > |------------| > > 9. | 3 1 . | > > 10. | 3 2 42 | > > 11. | 3 3 42 | > > 12. | 3 4 42 | > > +------------+ > > > I would also like Stata to shift up so that once i==2 is dropped then > i==3 would now take the place as i==2; would this be possible? > > > Best regards, > Lisa > > P.S. I now realised that I am receiving answers from the Nick Cox > mentioned in many of the help file. Sorry, my question might seem so > basic to you! > > > > > On Mon, Jul 23, 2012 at 11:56 PM, Nick Cox <n.j.cox@durham.ac.uk> wrote: > > What you observe is nothing to do with whether data have been declared as > panel data. > > > Consider this (in which no use of made of -tsset- or -xtset-) > > > . clear > > > . set obs 12 > > obs was 0, now 12 > > > . egen i = seq(), block(4) > > > . egen t = seq(), to(4) > > > . gen r = cond(_n == 7, ., 42) > > (1 missing value generated) > > > . l, sep(4) > > > +------------+ > > | i t r | > > |------------| > > 1. | 1 1 42 | > > 2. | 1 2 42 | > > 3. | 1 3 42 | > > 4. | 1 4 42 | > > |------------| > > 5. | 2 1 42 | > > 6. | 2 2 42 | > > 7. | 2 3 . | > > 8. | 2 4 42 | > > |------------| > > 9. | 3 1 42 | > > 10. | 3 2 42 | > > 11. | 3 3 42 | > > 12. | 3 4 42 | > > +------------+ > > > . bysort i (r) : drop if missing(r[_N]) > > (4 observations deleted) > > > . sort i t > > > . l, sep(4) > > > +------------+ > > | i t r | > > |------------| > > 1. | 1 1 42 | > > 2. | 1 2 42 | > > 3. | 1 3 42 | > > 4. | 1 4 42 | > > |------------| > > 5. | 3 1 42 | > > 6. | 3 2 42 | > > 7. | 3 3 42 | > > 8. | 3 4 42 | > > +------------+ > > > This code -drop-s entire panels if and only if there are any missing values > in a panel. Isn't that what you want? > > > It may be that you should also look here: > > > FAQ . . . . . . . . . . . . . . . . . . Dropping spells of missing > values > > . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox and G. > Longton > > 3/07 How can I drop spells of missing values at the > > beginning and end of panel data? > > http://www.stata.com/support/faqs/data/dropmiss.html > > > FAQ . . . . . . Identifying runs of consecutive observations in panel > data > > . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox and V. > Wiggins > > 8/05 How do I identify runs of consecutive observations > > in panel data? > > http://www.stata.com/support/faqs/data/panel.html > > > Please don't send me any datafiles, if only because I am travelling over the > next few weeks. > > > Nick > > n.j.cox@durham.ac.uk > > > Lisa Wang > > > Hi Nick, > > > Thank you for looking into my problem. Much appreciated. > > > I tried your suggestion of using -bysort i (r) : drop if > > missing(r[_N])- but it dropped all my observations, so I am left with zero > observations now after that line is run by Stata. > > > I don't know why; could it be that it's not declared as panel data but I do > have -xtset i t - at the start already. > > > Would you mind if I send you a partial part of my data set and do-file to > have a look? I would like to know how to solve this as dropping observations > in panel data I am sure I will need to do it on another project. > > > Many thanks, > > Lisa > > > > > On Mon, Jul 23, 2012 at 6:13 PM, Nick Cox <njcoxstata@gmail.com> wrote: > > I still have the same understanding of your problem and am at a loss > > to see why my previous suggestions don't "seem to work". > > > As I understand it, you have panel data with > > > identifier i > > time variable t > > "rate" r > > > and the problem is that -r- is missing (equal to system missing .) in > > some observations and you want to drop _entire panels_ if any > > observation in a panel has a missing value on -r-. A variant on my > > previous suggestions is > > > bysort i (r) : drop if missing(r[_N]) > > > Nick > > > On Mon, Jul 23, 2012 at 2:46 AM, Lisa Wang <lhwang0925@gmail.com> wrote: > > Hi all, > > > Both codes don't seem to drop any observations at all or drop all the > > observations. > > > @Nick - I also tried yours but likewise, it doesn't seem to work > > either. I need to summarise the data based on i as this represents > > each individual entity - each entity will have multiple r's (of > > differing amounts). t is just a variable created to do a timeline > > kind of thing (eg. -694, -693..,-1,0, +1...+2093 for instance) and > > the days in the timeline can vary for each individual entity. > > > If this is any help: > > > After I run this code - tabulate i t if window==1 & r==. - I get this > > output from Stata: > > > > | Event Timeline > > i | -1 0 1 | Total > > -----------+---------------------------------+---------- > > Amy1 | 0 0 1 | 1 > > Colin1 | 1 1 1 | 3 > > Chris1 | 0 0 1 | 1 > > Cat2 | 0 1 1 | 2 > > Ian1 | 1 1 0 | 2 > > Queenie1 | 1 1 1 | 3 > > Sam1 | 0 1 1 | 2 > > Uncle1 | 1 1 0 | 2 > > -----------+---------------------------------+---------- > > Total | 4 6 6 | 16 > > > > . levelsof i if window==1 & r==., local(entities) > > 2 4 6 7 9 14 21 25 > > > (eg. Amy1 is the second entity in my dataset, I want to remove ALL > > observations of Amy1 - not only the days (t) that I have missing > > observations as I want to omit these people from any further > > analysis). > > > > I also want i to be 22 (since 30 - 8 entities I want dropped from my > > dataset) as I will do some loops for regressions later on. > > > Thank you everyone for your kind help so far. > > > Kind regards, > > Lisa > > > On Mon, Jul 23, 2012 at 9:40 AM, Nick Cox <njcoxstata@gmail.com> wrote: > > Djalal's code can be simplified to > > > drop if t==. > > > as whether t is missing does not depend on its relation to other > > variables. So, it drops observations which are missing on -t-, which > > is not your problem. > > > However, Lisa overlooks my earlier posting > > > http://www.stata.com/statalist/archive/2012-07/msg00776.html > > > I got a bit lost in Lisa's explanation (for example further > > variables > > -twindow- and -holidaywindow- appear without any explanation) but my > > solution should still be relevant. Another solution might be > > > bysort i (window) : drop if window[_N] == 1 > > > Nick > > > On Sun, Jul 22, 2012 at 10:46 PM, Lisa Wang <lhwang0925@gmail.com> wrote: > > Hi Djala, > > > Thank you for your help. > > > I have tried your recommendation but it does not delete any > > observations from my data set at all. > > > Maybe I didn't specify my query well enough. If there are missing > > observations within a particular period, which is denoted by a > > dummy variable 'window', then drop ALL the observations pertaining > > to that person - not only the rows that have missing observations. > > > Would you have any other suggestions? > > > Kind regards, > > Lisa > > > > > On Mon, Jul 23, 2012 at 1:11 AM, Arinloye Djalal <aridjal@gmail.com> wrote: > > Hi Lisa, > > Have you tried the following syntax? > > > by i, sort : drop if t==. > > > This will allow you have t variable without any missing observation. > > As you have already distinguished wish people/rows are concerned > > you can manually drop them from data editor. > > > Hope this can hope. > > > > Djalal Arinloye > > > > -----Message d'origine----- > > De : owner-statalist@hsphsun2.harvard.edu > > [mailto:owner-statalist@hsphsun2.harvard.edu] De la part de Lisa > > Wang Envoyé : Sunday, July 22, 2012 12:51 PM À : > > statalist@hsphsun2.harvard.edu Objet : st: drop variables in panel > > data with loop > > > I am having trouble with Stata and would like some guidance on > > what I am doing incorrectly. I am new to Stata (only 1 month into > > it), so I am still trying to learn and sometimes still thinking like in > Excel. > > > I will try to be as detailed as possible, so you can understand my question. > > > To describe my data set, I have some panel data and a variable i, > > which is the names (eg. Mary, Tom...) but encoded into a numeric > > as > > such: - encode symbol1, generate (i) -. There are 59732 rows and > > the count of i is 30. > > > What I would like to achieve is to tell the program to drop the > > observations that have missing values for a variable for a > > specific period (variable window). E.g. If there is no data for > > "Mary" for day > > 102 then drop all the rows pertaining to "Mary" from day 1...T - > > not only drop the the observation for Mary on day 102. > > > This is my code to try to achieve this: > > > version 12.1 > > clear all > > set more off > > > cd "C:\Users\Admin\Desktop" > > > use window_students, clear > > > xtset i t > > //check panel structure is correct > > > > summ i // this tells me that the max of variable i is 30, which is > > correct as I have 30 people I need to analyse > > > tabulate i t if window==1 & r==. > > //r is another variable stored in another column, which > > represents their rates. There are 8 people that don't have any > > rates within my window. > > ///I would like to remove all the observations pertaining to these > > peopl > > > levelsof i if window==1 & r==., local(entities) //tried to > > store the people that were missing into a local macro - these are > > i = > > 2 4 6 7 9 14 21 25 > > > > > Then I tried this: > > > *Method 1 - but then results window has return code 198 and > > invalid '4' in red text > > > foreach i of local entities{ > > drop if i==`entities' > > } > > > > *Method 2 - but then results window has return code 111 and > > variable i not found > > > foreach i of local entities{ > > drop i > > } > > > *Method 3 - but it deleted all of my observations > > > foreach i of local entities{ > > drop i > > } > > > *Method 4 - after Stata told me that it was person 2,4, 6, 7, 9 etc... > > that were missing observations I wrote out each line > > > drop if i==2 > > drop if i==4 //etc..... > > > summ i // I still get 30 in the summary but it has told me > > that it has deleted observations for each drop if line that I > > used....shouldn't it be 22 now after I removed the 8 people? > > > > > I am stuck now...as I need the i to be correct as I will be doing > > some regressions with the i later, that's why I have to drop the > > people that don't have observations in my dataset before I do > > further analysis. > > > eg. > > summarize i > > local m = r(max) > > //create a local macro storing the max number of distinct entities > > from an r-scalar > > > generate ar = . > > > > > forvalues x = 1/`m' { > > //run regression for every entity in data set > > regress r ind if i==`x' & twindow > > > predict res if i==`x', residuals //predict > > residuals both in-sample and out-of-sample > > replace ar=res if i==`x' & holidaywindow //replace > > ar=. with thes estimated residuals > > drop res > > } > > > * > > * For searches and help try: > > * http://www.stata.com/help.cgi?search > > * http://www.stata.com/support/statalist/faq > > * http://www.ats.ucla.edu/stat/stata/ > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/