Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <n.j.cox@durham.ac.uk> |

To |
"'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: RE: drop variables in panel data with loop |

Date |
Mon, 23 Jul 2012 14:56:52 +0100 |

What you observe is nothing to do with whether data have been declared as panel data. Consider this (in which no use of made of -tsset- or -xtset-) . clear . set obs 12 obs was 0, now 12 . egen i = seq(), block(4) . egen t = seq(), to(4) . gen r = cond(_n == 7, ., 42) (1 missing value generated) . l, sep(4) +------------+ | i t r | |------------| 1. | 1 1 42 | 2. | 1 2 42 | 3. | 1 3 42 | 4. | 1 4 42 | |------------| 5. | 2 1 42 | 6. | 2 2 42 | 7. | 2 3 . | 8. | 2 4 42 | |------------| 9. | 3 1 42 | 10. | 3 2 42 | 11. | 3 3 42 | 12. | 3 4 42 | +------------+ . bysort i (r) : drop if missing(r[_N]) (4 observations deleted) . sort i t . l, sep(4) +------------+ | i t r | |------------| 1. | 1 1 42 | 2. | 1 2 42 | 3. | 1 3 42 | 4. | 1 4 42 | |------------| 5. | 3 1 42 | 6. | 3 2 42 | 7. | 3 3 42 | 8. | 3 4 42 | +------------+ This code -drop-s entire panels if and only if there are any missing values in a panel. Isn't that what you want? It may be that you should also look here: FAQ . . . . . . . . . . . . . . . . . . Dropping spells of missing values . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox and G. Longton 3/07 How can I drop spells of missing values at the beginning and end of panel data? http://www.stata.com/support/faqs/data/dropmiss.html FAQ . . . . . . Identifying runs of consecutive observations in panel data . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox and V. Wiggins 8/05 How do I identify runs of consecutive observations in panel data? http://www.stata.com/support/faqs/data/panel.html Please don't send me any datafiles, if only because I am travelling over the next few weeks. Nick n.j.cox@durham.ac.uk Lisa Wang Hi Nick, Thank you for looking into my problem. Much appreciated. I tried your suggestion of using -bysort i (r) : drop if missing(r[_N])- but it dropped all my observations, so I am left with zero observations now after that line is run by Stata. I don't know why; could it be that it's not declared as panel data but I do have -xtset i t - at the start already. Would you mind if I send you a partial part of my data set and do-file to have a look? I would like to know how to solve this as dropping observations in panel data I am sure I will need to do it on another project. Many thanks, Lisa On Mon, Jul 23, 2012 at 6:13 PM, Nick Cox <njcoxstata@gmail.com> wrote: > I still have the same understanding of your problem and am at a loss > to see why my previous suggestions don't "seem to work". > > As I understand it, you have panel data with > > identifier i > time variable t > "rate" r > > and the problem is that -r- is missing (equal to system missing .) in > some observations and you want to drop _entire panels_ if any > observation in a panel has a missing value on -r-. A variant on my > previous suggestions is > > bysort i (r) : drop if missing(r[_N]) > > Nick > > On Mon, Jul 23, 2012 at 2:46 AM, Lisa Wang <lhwang0925@gmail.com> wrote: >> Hi all, >> >> Both codes don't seem to drop any observations at all or drop all the >> observations. >> >> @Nick - I also tried yours but likewise, it doesn't seem to work >> either. I need to summarise the data based on i as this represents >> each individual entity - each entity will have multiple r's (of >> differing amounts). t is just a variable created to do a timeline >> kind of thing (eg. -694, -693..,-1,0, +1...+2093 for instance) and >> the days in the timeline can vary for each individual entity. >> >> If this is any help: >> >> After I run this code - tabulate i t if window==1 & r==. - I get this >> output from Stata: >> >> >> | Event Timeline >> i | -1 0 1 | Total >> -----------+---------------------------------+---------- >> Amy1 | 0 0 1 | 1 >> Colin1 | 1 1 1 | 3 >> Chris1 | 0 0 1 | 1 >> Cat2 | 0 1 1 | 2 >> Ian1 | 1 1 0 | 2 >> Queenie1 | 1 1 1 | 3 >> Sam1 | 0 1 1 | 2 >> Uncle1 | 1 1 0 | 2 >> -----------+---------------------------------+---------- >> Total | 4 6 6 | 16 >> >> >> . levelsof i if window==1 & r==., local(entities) >> 2 4 6 7 9 14 21 25 >> >> (eg. Amy1 is the second entity in my dataset, I want to remove ALL >> observations of Amy1 - not only the days (t) that I have missing >> observations as I want to omit these people from any further >> analysis). >> >> >> I also want i to be 22 (since 30 - 8 entities I want dropped from my >> dataset) as I will do some loops for regressions later on. >> >> Thank you everyone for your kind help so far. >> >> Kind regards, >> Lisa >> >> On Mon, Jul 23, 2012 at 9:40 AM, Nick Cox <njcoxstata@gmail.com> wrote: >>> Djalal's code can be simplified to >>> >>> drop if t==. >>> >>> as whether t is missing does not depend on its relation to other >>> variables. So, it drops observations which are missing on -t-, which >>> is not your problem. >>> >>> However, Lisa overlooks my earlier posting >>> >>> http://www.stata.com/statalist/archive/2012-07/msg00776.html >>> >>> I got a bit lost in Lisa's explanation (for example further >>> variables >>> -twindow- and -holidaywindow- appear without any explanation) but my >>> solution should still be relevant. Another solution might be >>> >>> bysort i (window) : drop if window[_N] == 1 >>> >>> Nick >>> >>> On Sun, Jul 22, 2012 at 10:46 PM, Lisa Wang <lhwang0925@gmail.com> wrote: >>>> Hi Djala, >>>> >>>> Thank you for your help. >>>> >>>> I have tried your recommendation but it does not delete any >>>> observations from my data set at all. >>>> >>>> Maybe I didn't specify my query well enough. If there are missing >>>> observations within a particular period, which is denoted by a >>>> dummy variable 'window', then drop ALL the observations pertaining >>>> to that person - not only the rows that have missing observations. >>>> >>>> Would you have any other suggestions? >>>> >>>> Kind regards, >>>> Lisa >>>> >>>> >>>> >>>> On Mon, Jul 23, 2012 at 1:11 AM, Arinloye Djalal <aridjal@gmail.com> wrote: >>>>> Hi Lisa, >>>>> Have you tried the following syntax? >>>>> >>>>> by i, sort : drop if t==. >>>>> >>>>> This will allow you have t variable without any missing observation. >>>>> As you have already distinguished wish people/rows are concerned >>>>> you can manually drop them from data editor. >>>>> >>>>> Hope this can hope. >>>>> >>>>> >>>>> Djalal Arinloye >>>>> >>>>> >>>>> -----Message d'origine----- >>>>> De : owner-statalist@hsphsun2.harvard.edu >>>>> [mailto:owner-statalist@hsphsun2.harvard.edu] De la part de Lisa >>>>> Wang Envoyé : Sunday, July 22, 2012 12:51 PM À : >>>>> statalist@hsphsun2.harvard.edu Objet : st: drop variables in panel >>>>> data with loop >>>>> >>>>> I am having trouble with Stata and would like some guidance on >>>>> what I am doing incorrectly. I am new to Stata (only 1 month into >>>>> it), so I am still trying to learn and sometimes still thinking like in Excel. >>>>> >>>>> I will try to be as detailed as possible, so you can understand my question. >>>>> >>>>> To describe my data set, I have some panel data and a variable i, >>>>> which is the names (eg. Mary, Tom...) but encoded into a numeric >>>>> as >>>>> such: - encode symbol1, generate (i) -. There are 59732 rows and >>>>> the count of i is 30. >>>>> >>>>> What I would like to achieve is to tell the program to drop the >>>>> observations that have missing values for a variable for a >>>>> specific period (variable window). E.g. If there is no data for >>>>> "Mary" for day >>>>> 102 then drop all the rows pertaining to "Mary" from day 1...T - >>>>> not only drop the the observation for Mary on day 102. >>>>> >>>>> This is my code to try to achieve this: >>>>> >>>>> version 12.1 >>>>> clear all >>>>> set more off >>>>> >>>>> cd "C:\Users\Admin\Desktop" >>>>> >>>>> use window_students, clear >>>>> >>>>> xtset i t >>>>> //check panel structure is correct >>>>> >>>>> >>>>> summ i // this tells me that the max of variable i is 30, which is >>>>> correct as I have 30 people I need to analyse >>>>> >>>>> tabulate i t if window==1 & r==. >>>>> //r is another variable stored in another column, which >>>>> represents their rates. There are 8 people that don't have any >>>>> rates within my window. >>>>> ///I would like to remove all the observations pertaining to these >>>>> peopl >>>>> >>>>> levelsof i if window==1 & r==., local(entities) //tried to >>>>> store the people that were missing into a local macro - these are >>>>> i = >>>>> 2 4 6 7 9 14 21 25 >>>>> >>>>> >>>>> >>>>> Then I tried this: >>>>> >>>>> *Method 1 - but then results window has return code 198 and >>>>> invalid '4' in red text >>>>> >>>>> foreach i of local entities{ >>>>> drop if i==`entities' >>>>> } >>>>> >>>>> >>>>> *Method 2 - but then results window has return code 111 and >>>>> variable i not found >>>>> >>>>> foreach i of local entities{ >>>>> drop i >>>>> } >>>>> >>>>> *Method 3 - but it deleted all of my observations >>>>> >>>>> foreach i of local entities{ >>>>> drop i >>>>> } >>>>> >>>>> *Method 4 - after Stata told me that it was person 2,4, 6, 7, 9 etc... >>>>> that were missing observations I wrote out each line >>>>> >>>>> drop if i==2 >>>>> drop if i==4 //etc..... >>>>> >>>>> summ i // I still get 30 in the summary but it has told me >>>>> that it has deleted observations for each drop if line that I >>>>> used....shouldn't it be 22 now after I removed the 8 people? >>>>> >>>>> >>>>> >>>>> I am stuck now...as I need the i to be correct as I will be doing >>>>> some regressions with the i later, that's why I have to drop the >>>>> people that don't have observations in my dataset before I do >>>>> further analysis. >>>>> >>>>> eg. >>>>> summarize i >>>>> local m = r(max) >>>>> //create a local macro storing the max number of distinct entities >>>>> from an r-scalar >>>>> >>>>> generate ar = . >>>>> >>>>> >>>>> >>>>> forvalues x = 1/`m' { >>>>> //run regression for every entity in data set >>>>> regress r ind if i==`x' & twindow >>>>> >>>>> predict res if i==`x', residuals //predict >>>>> residuals both in-sample and out-of-sample >>>>> replace ar=res if i==`x' & holidaywindow //replace >>>>> ar=. with thes estimated residuals >>>>> drop res >>>>> } * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: RE: drop variables in panel data with loop***From:*Lisa Wang <lhwang0925@gmail.com>

**References**:**st: drop variables in panel data with loop***From:*Lisa Wang <lhwang0925@gmail.com>

**st: RE: drop variables in panel data with loop***From:*"Arinloye Djalal" <aridjal@gmail.com>

**Re: st: RE: drop variables in panel data with loop***From:*Lisa Wang <lhwang0925@gmail.com>

**Re: st: RE: drop variables in panel data with loop***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: RE: drop variables in panel data with loop***From:*Lisa Wang <lhwang0925@gmail.com>

**Re: st: RE: drop variables in panel data with loop***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: RE: drop variables in panel data with loop***From:*Lisa Wang <lhwang0925@gmail.com>

- Prev by Date:
**Re: st: RE: drop variables in panel data with loop** - Next by Date:
**Re: st: Optimize** - Previous by thread:
**Re: st: RE: drop variables in panel data with loop** - Next by thread:
**Re: st: RE: drop variables in panel data with loop** - Index(es):