Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Lisa Wang <lhwang0925@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: RE: drop variables in panel data with loop |

Date |
Tue, 24 Jul 2012 08:26:11 +1000 |

Thank you for your example. That made is so much clearer! I should have done something similar at the start. Now, I know how to word my question better next time. Thank you. I now understand why the code you kindly suggested to me may have dropped all of my observations. For each i, I will definitely have missing r's somewhere in the panel for each i, so Stata recognises this and drops everything for me. Using your example below (with a modified observation 9 to be missing as well), then both i==2 and 3 would be dropped. Let's say, however, I only want Stata to drop the the panel only if there is/are missing r's between t =-3 to 4 (i.e. i==2 would all be dropped but i==3 would remain in my dataset). I don't want i==3 to be dropped though as that won't cause a problem to my further analysis. > +------------+ > | i t r | > |------------| > 1. | 1 1 42 | > 2. | 1 2 42 | > 3. | 1 3 42 | > 4. | 1 4 42 | > |------------| > 5. | 2 1 42 | > 6. | 2 2 42 | > 7. | 2 3 . | > 8. | 2 4 42 | > |------------| > 9. | 3 1 . | > 10. | 3 2 42 | > 11. | 3 3 42 | > 12. | 3 4 42 | > +------------+ I would also like Stata to shift up so that once i==2 is dropped then i==3 would now take the place as i==2; would this be possible? Best regards, Lisa P.S. I now realised that I am receiving answers from the Nick Cox mentioned in many of the help file. Sorry, my question might seem so basic to you! On Mon, Jul 23, 2012 at 11:56 PM, Nick Cox <n.j.cox@durham.ac.uk> wrote: > What you observe is nothing to do with whether data have been declared as panel data. > > Consider this (in which no use of made of -tsset- or -xtset-) > > . clear > > . set obs 12 > obs was 0, now 12 > > . egen i = seq(), block(4) > > . egen t = seq(), to(4) > > . gen r = cond(_n == 7, ., 42) > (1 missing value generated) > > . l, sep(4) > > +------------+ > | i t r | > |------------| > 1. | 1 1 42 | > 2. | 1 2 42 | > 3. | 1 3 42 | > 4. | 1 4 42 | > |------------| > 5. | 2 1 42 | > 6. | 2 2 42 | > 7. | 2 3 . | > 8. | 2 4 42 | > |------------| > 9. | 3 1 42 | > 10. | 3 2 42 | > 11. | 3 3 42 | > 12. | 3 4 42 | > +------------+ > > . bysort i (r) : drop if missing(r[_N]) > (4 observations deleted) > > . sort i t > > . l, sep(4) > > +------------+ > | i t r | > |------------| > 1. | 1 1 42 | > 2. | 1 2 42 | > 3. | 1 3 42 | > 4. | 1 4 42 | > |------------| > 5. | 3 1 42 | > 6. | 3 2 42 | > 7. | 3 3 42 | > 8. | 3 4 42 | > +------------+ > > This code -drop-s entire panels if and only if there are any missing values in a panel. Isn't that what you want? > > It may be that you should also look here: > > FAQ . . . . . . . . . . . . . . . . . . Dropping spells of missing values > . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox and G. Longton > 3/07 How can I drop spells of missing values at the > beginning and end of panel data? > http://www.stata.com/support/faqs/data/dropmiss.html > > FAQ . . . . . . Identifying runs of consecutive observations in panel data > . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox and V. Wiggins > 8/05 How do I identify runs of consecutive observations > in panel data? > http://www.stata.com/support/faqs/data/panel.html > > Please don't send me any datafiles, if only because I am travelling over the next few weeks. > > Nick > n.j.cox@durham.ac.uk > > Lisa Wang > > Hi Nick, > > Thank you for looking into my problem. Much appreciated. > > I tried your suggestion of using -bysort i (r) : drop if > missing(r[_N])- but it dropped all my observations, so I am left with zero observations now after that line is run by Stata. > > I don't know why; could it be that it's not declared as panel data but I do have -xtset i t - at the start already. > > Would you mind if I send you a partial part of my data set and do-file to have a look? I would like to know how to solve this as dropping observations in panel data I am sure I will need to do it on another project. > > Many thanks, > Lisa > > > > On Mon, Jul 23, 2012 at 6:13 PM, Nick Cox <njcoxstata@gmail.com> wrote: >> I still have the same understanding of your problem and am at a loss >> to see why my previous suggestions don't "seem to work". >> >> As I understand it, you have panel data with >> >> identifier i >> time variable t >> "rate" r >> >> and the problem is that -r- is missing (equal to system missing .) in >> some observations and you want to drop _entire panels_ if any >> observation in a panel has a missing value on -r-. A variant on my >> previous suggestions is >> >> bysort i (r) : drop if missing(r[_N]) >> >> Nick >> >> On Mon, Jul 23, 2012 at 2:46 AM, Lisa Wang <lhwang0925@gmail.com> wrote: >>> Hi all, >>> >>> Both codes don't seem to drop any observations at all or drop all the >>> observations. >>> >>> @Nick - I also tried yours but likewise, it doesn't seem to work >>> either. I need to summarise the data based on i as this represents >>> each individual entity - each entity will have multiple r's (of >>> differing amounts). t is just a variable created to do a timeline >>> kind of thing (eg. -694, -693..,-1,0, +1...+2093 for instance) and >>> the days in the timeline can vary for each individual entity. >>> >>> If this is any help: >>> >>> After I run this code - tabulate i t if window==1 & r==. - I get this >>> output from Stata: >>> >>> >>> | Event Timeline >>> i | -1 0 1 | Total >>> -----------+---------------------------------+---------- >>> Amy1 | 0 0 1 | 1 >>> Colin1 | 1 1 1 | 3 >>> Chris1 | 0 0 1 | 1 >>> Cat2 | 0 1 1 | 2 >>> Ian1 | 1 1 0 | 2 >>> Queenie1 | 1 1 1 | 3 >>> Sam1 | 0 1 1 | 2 >>> Uncle1 | 1 1 0 | 2 >>> -----------+---------------------------------+---------- >>> Total | 4 6 6 | 16 >>> >>> >>> . levelsof i if window==1 & r==., local(entities) >>> 2 4 6 7 9 14 21 25 >>> >>> (eg. Amy1 is the second entity in my dataset, I want to remove ALL >>> observations of Amy1 - not only the days (t) that I have missing >>> observations as I want to omit these people from any further >>> analysis). >>> >>> >>> I also want i to be 22 (since 30 - 8 entities I want dropped from my >>> dataset) as I will do some loops for regressions later on. >>> >>> Thank you everyone for your kind help so far. >>> >>> Kind regards, >>> Lisa >>> >>> On Mon, Jul 23, 2012 at 9:40 AM, Nick Cox <njcoxstata@gmail.com> wrote: >>>> Djalal's code can be simplified to >>>> >>>> drop if t==. >>>> >>>> as whether t is missing does not depend on its relation to other >>>> variables. So, it drops observations which are missing on -t-, which >>>> is not your problem. >>>> >>>> However, Lisa overlooks my earlier posting >>>> >>>> http://www.stata.com/statalist/archive/2012-07/msg00776.html >>>> >>>> I got a bit lost in Lisa's explanation (for example further >>>> variables >>>> -twindow- and -holidaywindow- appear without any explanation) but my >>>> solution should still be relevant. Another solution might be >>>> >>>> bysort i (window) : drop if window[_N] == 1 >>>> >>>> Nick >>>> >>>> On Sun, Jul 22, 2012 at 10:46 PM, Lisa Wang <lhwang0925@gmail.com> wrote: >>>>> Hi Djala, >>>>> >>>>> Thank you for your help. >>>>> >>>>> I have tried your recommendation but it does not delete any >>>>> observations from my data set at all. >>>>> >>>>> Maybe I didn't specify my query well enough. If there are missing >>>>> observations within a particular period, which is denoted by a >>>>> dummy variable 'window', then drop ALL the observations pertaining >>>>> to that person - not only the rows that have missing observations. >>>>> >>>>> Would you have any other suggestions? >>>>> >>>>> Kind regards, >>>>> Lisa >>>>> >>>>> >>>>> >>>>> On Mon, Jul 23, 2012 at 1:11 AM, Arinloye Djalal <aridjal@gmail.com> wrote: >>>>>> Hi Lisa, >>>>>> Have you tried the following syntax? >>>>>> >>>>>> by i, sort : drop if t==. >>>>>> >>>>>> This will allow you have t variable without any missing observation. >>>>>> As you have already distinguished wish people/rows are concerned >>>>>> you can manually drop them from data editor. >>>>>> >>>>>> Hope this can hope. >>>>>> >>>>>> >>>>>> Djalal Arinloye >>>>>> >>>>>> >>>>>> -----Message d'origine----- >>>>>> De : owner-statalist@hsphsun2.harvard.edu >>>>>> [mailto:owner-statalist@hsphsun2.harvard.edu] De la part de Lisa >>>>>> Wang Envoyé : Sunday, July 22, 2012 12:51 PM À : >>>>>> statalist@hsphsun2.harvard.edu Objet : st: drop variables in panel >>>>>> data with loop >>>>>> >>>>>> I am having trouble with Stata and would like some guidance on >>>>>> what I am doing incorrectly. I am new to Stata (only 1 month into >>>>>> it), so I am still trying to learn and sometimes still thinking like in Excel. >>>>>> >>>>>> I will try to be as detailed as possible, so you can understand my question. >>>>>> >>>>>> To describe my data set, I have some panel data and a variable i, >>>>>> which is the names (eg. Mary, Tom...) but encoded into a numeric >>>>>> as >>>>>> such: - encode symbol1, generate (i) -. There are 59732 rows and >>>>>> the count of i is 30. >>>>>> >>>>>> What I would like to achieve is to tell the program to drop the >>>>>> observations that have missing values for a variable for a >>>>>> specific period (variable window). E.g. If there is no data for >>>>>> "Mary" for day >>>>>> 102 then drop all the rows pertaining to "Mary" from day 1...T - >>>>>> not only drop the the observation for Mary on day 102. >>>>>> >>>>>> This is my code to try to achieve this: >>>>>> >>>>>> version 12.1 >>>>>> clear all >>>>>> set more off >>>>>> >>>>>> cd "C:\Users\Admin\Desktop" >>>>>> >>>>>> use window_students, clear >>>>>> >>>>>> xtset i t >>>>>> //check panel structure is correct >>>>>> >>>>>> >>>>>> summ i // this tells me that the max of variable i is 30, which is >>>>>> correct as I have 30 people I need to analyse >>>>>> >>>>>> tabulate i t if window==1 & r==. >>>>>> //r is another variable stored in another column, which >>>>>> represents their rates. There are 8 people that don't have any >>>>>> rates within my window. >>>>>> ///I would like to remove all the observations pertaining to these >>>>>> peopl >>>>>> >>>>>> levelsof i if window==1 & r==., local(entities) //tried to >>>>>> store the people that were missing into a local macro - these are >>>>>> i = >>>>>> 2 4 6 7 9 14 21 25 >>>>>> >>>>>> >>>>>> >>>>>> Then I tried this: >>>>>> >>>>>> *Method 1 - but then results window has return code 198 and >>>>>> invalid '4' in red text >>>>>> >>>>>> foreach i of local entities{ >>>>>> drop if i==`entities' >>>>>> } >>>>>> >>>>>> >>>>>> *Method 2 - but then results window has return code 111 and >>>>>> variable i not found >>>>>> >>>>>> foreach i of local entities{ >>>>>> drop i >>>>>> } >>>>>> >>>>>> *Method 3 - but it deleted all of my observations >>>>>> >>>>>> foreach i of local entities{ >>>>>> drop i >>>>>> } >>>>>> >>>>>> *Method 4 - after Stata told me that it was person 2,4, 6, 7, 9 etc... >>>>>> that were missing observations I wrote out each line >>>>>> >>>>>> drop if i==2 >>>>>> drop if i==4 //etc..... >>>>>> >>>>>> summ i // I still get 30 in the summary but it has told me >>>>>> that it has deleted observations for each drop if line that I >>>>>> used....shouldn't it be 22 now after I removed the 8 people? >>>>>> >>>>>> >>>>>> >>>>>> I am stuck now...as I need the i to be correct as I will be doing >>>>>> some regressions with the i later, that's why I have to drop the >>>>>> people that don't have observations in my dataset before I do >>>>>> further analysis. >>>>>> >>>>>> eg. >>>>>> summarize i >>>>>> local m = r(max) >>>>>> //create a local macro storing the max number of distinct entities >>>>>> from an r-scalar >>>>>> >>>>>> generate ar = . >>>>>> >>>>>> >>>>>> >>>>>> forvalues x = 1/`m' { >>>>>> //run regression for every entity in data set >>>>>> regress r ind if i==`x' & twindow >>>>>> >>>>>> predict res if i==`x', residuals //predict >>>>>> residuals both in-sample and out-of-sample >>>>>> replace ar=res if i==`x' & holidaywindow //replace >>>>>> ar=. with thes estimated residuals >>>>>> drop res >>>>>> } > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: drop variables in panel data with loop***From:*Lisa Wang <lhwang0925@gmail.com>

**st: RE: drop variables in panel data with loop***From:*"Arinloye Djalal" <aridjal@gmail.com>

**Re: st: RE: drop variables in panel data with loop***From:*Lisa Wang <lhwang0925@gmail.com>

**Re: st: RE: drop variables in panel data with loop***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: RE: drop variables in panel data with loop***From:*Lisa Wang <lhwang0925@gmail.com>

**Re: st: RE: drop variables in panel data with loop***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: RE: drop variables in panel data with loop***From:*Lisa Wang <lhwang0925@gmail.com>

**RE: st: RE: drop variables in panel data with loop***From:*Nick Cox <n.j.cox@durham.ac.uk>

- Prev by Date:
**st: Construct a Panel dataset** - Next by Date:
**st: Systems of simultaneous equations / Longitudinal / All binary dependent variables** - Previous by thread:
**RE: st: RE: drop variables in panel data with loop** - Next by thread:
**Re: st: RE: drop variables in panel data with loop** - Index(es):