Lisa Wang

statalist@hsphsun2.harvard.edu

Re: st: RE: drop variables in panel data with loop

Mon, 23 Jul 2012 11:46:31 +1000

Hi all, Both codes don't seem to drop any observations at all or drop all the observations. @Nick - I also tried yours but likewise, it doesn't seem to work either. I need to summarise the data based on i as this represents each individual entity - each entity will have multiple r's (of differing amounts). t is just a variable created to do a timeline kind of thing (eg. -694, -693..,-1,0, +1...+2093 for instance) and the days in the timeline can vary for each individual entity. If this is any help: After I run this code - tabulate i t if window==1 & r==. - I get this output from Stata: | Event Timeline i | -1 0 1 | Total -----------+---------------------------------+---------- Amy1 | 0 0 1 | 1 Colin1 | 1 1 1 | 3 Chris1 | 0 0 1 | 1 Cat2 | 0 1 1 | 2 Ian1 | 1 1 0 | 2 Queenie1 | 1 1 1 | 3 Sam1 | 0 1 1 | 2 Uncle1 | 1 1 0 | 2 -----------+---------------------------------+---------- Total | 4 6 6 | 16 . levelsof i if window==1 & r==., local(entities) 2 4 6 7 9 14 21 25 (eg. Amy1 is the second entity in my dataset, I want to remove ALL observations of Amy1 - not only the days (t) that I have missing observations as I want to omit these people from any further analysis). I also want i to be 22 (since 30 - 8 entities I want dropped from my dataset) as I will do some loops for regressions later on. Thank you everyone for your kind help so far. Kind regards, Lisa On Mon, Jul 23, 2012 at 9:40 AM, Nick Cox <njcoxstata@gmail.com> wrote: > Djalal's code can be simplified to > > drop if t==. > > as whether t is missing does not depend on its relation to other > variables. So, it drops observations which are missing on -t-, which > is not your problem. > > However, Lisa overlooks my earlier posting > > http://www.stata.com/statalist/archive/2012-07/msg00776.html > > I got a bit lost in Lisa's explanation (for example further variables > -twindow- and -holidaywindow- appear without any explanation) but my > solution should still be relevant. Another solution might be > > bysort i (window) : drop if window[_N] == 1 > > Nick > > On Sun, Jul 22, 2012 at 10:46 PM, Lisa Wang <lhwang0925@gmail.com> wrote: >> Hi Djala, >> >> Thank you for your help. >> >> I have tried your recommendation but it does not delete any >> observations from my data set at all. >> >> Maybe I didn't specify my query well enough. If there are missing >> observations within a particular period, which is denoted by a dummy >> variable 'window', then drop ALL the observations pertaining to that >> person - not only the rows that have missing observations. >> >> Would you have any other suggestions? >> >> Kind regards, >> Lisa >> >> >> >> On Mon, Jul 23, 2012 at 1:11 AM, Arinloye Djalal <aridjal@gmail.com> wrote: >>> Hi Lisa, >>> Have you tried the following syntax? >>> >>> by i, sort : drop if t==. >>> >>> This will allow you have t variable without any missing observation. >>> As you have already distinguished wish people/rows are concerned you can >>> manually drop them from data editor. >>> >>> Hope this can hope. >>> >>> >>> Djalal Arinloye >>> >>> >>> -----Message d'origine----- >>> De : owner-statalist@hsphsun2.harvard.edu >>> [mailto:owner-statalist@hsphsun2.harvard.edu] De la part de Lisa Wang >>> Envoyé : Sunday, July 22, 2012 12:51 PM >>> À : statalist@hsphsun2.harvard.edu >>> Objet : st: drop variables in panel data with loop >>> >>> I am having trouble with Stata and would like some guidance on what I >>> am doing incorrectly. I am new to Stata (only 1 month into it), so I >>> am still trying to learn and sometimes still thinking like in Excel. >>> >>> I will try to be as detailed as possible, so you can understand my question. >>> >>> To describe my data set, I have some panel data and a variable i, >>> which is the names (eg. Mary, Tom...) but encoded into a numeric as >>> such: - encode symbol1, generate (i) -. There are 59732 rows and the >>> count of i is 30. >>> >>> What I would like to achieve is to tell the program to drop the >>> observations that have missing values for a variable for a specific >>> period (variable window). E.g. If there is no data for "Mary" for day >>> 102 then drop all the rows pertaining to "Mary" from day 1...T - not >>> only drop the the observation for Mary on day 102. >>> >>> This is my code to try to achieve this: >>> >>> version 12.1 >>> clear all >>> set more off >>> >>> cd "C:\Users\Admin\Desktop" >>> >>> use window_students, clear >>> >>> xtset i t >>> //check panel structure is correct >>> >>> >>> summ i // this tells me that the max of variable i is 30, which is >>> correct as I have 30 people I need to analyse >>> >>> tabulate i t if window==1 & r==. >>> //r is another variable stored in another column, which represents >>> their rates. There are 8 people that don't have any rates within my >>> window. >>> ///I would like to remove all the observations pertaining to these peopl >>> >>> levelsof i if window==1 & r==., local(entities) //tried to >>> store the people that were missing into a local macro - these are i = >>> 2 4 6 7 9 14 21 25 >>> >>> >>> >>> Then I tried this: >>> >>> *Method 1 - but then results window has return code 198 and invalid >>> '4' in red text >>> >>> foreach i of local entities{ >>> drop if i==`entities' >>> } >>> >>> >>> *Method 2 - but then results window has return code 111 and variable i not >>> found >>> >>> foreach i of local entities{ >>> drop i >>> } >>> >>> *Method 3 - but it deleted all of my observations >>> >>> foreach i of local entities{ >>> drop i >>> } >>> >>> *Method 4 - after Stata told me that it was person 2,4, 6, 7, 9 etc... >>> that were missing observations I wrote out each line >>> >>> drop if i==2 >>> drop if i==4 //etc..... >>> >>> summ i // I still get 30 in the summary but it has told me >>> that it has deleted observations for each drop if line that I >>> used....shouldn't it be 22 now after I removed the 8 people? >>> >>> >>> >>> I am stuck now...as I need the i to be correct as I will be doing some >>> regressions with the i later, that's why I have to drop the people >>> that don't have observations in my dataset before I do further >>> analysis. >>> >>> eg. >>> summarize i >>> local m = r(max) >>> //create a local macro storing the max >>> number of distinct entities from an r-scalar >>> >>> generate ar = . >>> >>> >>> >>> forvalues x = 1/`m' { >>> //run regression for every entity in data set >>> regress r ind if i==`x' & twindow >>> >>> predict res if i==`x', residuals >>> //predict residuals both >>> in-sample and out-of-sample >>> replace ar=res if i==`x' & holidaywindow >>> //replace ar=. with thes >>> estimated residuals >>> drop res >>> } >>> >>> >>> >>> Sorry for the long email. 