Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: RE: drop variables in panel data with loop
From 
 
Lisa Wang <[email protected]> 
To 
 
[email protected] 
Subject 
 
Re: st: RE: drop variables in panel data with loop 
Date 
 
Tue, 24 Jul 2012 23:02:46 +1000 
No problems, I understand. Thank you for your suggestions thus far
anyways. I appreciate it.
Can anyone else suggest code to how I should approach this?
Thanks,
Lisa
On Tue, Jul 24, 2012 at 5:45 PM, Nick Cox <[email protected]> wrote:
>
> I'm travelling and can't write long replies. If you don't get further help,
> contact Stata tech support.
>
> Begin forwarded message:
>
> From: Lisa Wang <[email protected]>
> Date: 23 July 2012 23:26:11 GMT+01:00
> To: [email protected]
> Subject: Re: st: RE: drop variables in panel data with loop
> Reply-To: [email protected]
>
> Thank you for your example. That made is so much clearer! I should
> have done something similar at the start. Now, I know how to word my
> question better next time. Thank you.
>
> I now understand why the code you kindly suggested to me may have
> dropped all of my observations.
>
> For each i, I will definitely have missing r's somewhere in the panel
> for each i, so Stata recognises this and drops everything for me.
>
> Using your example below (with a modified observation 9 to be missing
> as well), then both i==2 and 3 would be dropped. Let's say, however, I
> only want Stata to drop the the panel only if there is/are missing r's
> between t =-3 to 4 (i.e. i==2 would all be dropped but i==3 would
> remain in my dataset). I don't want i==3 to be dropped though as that
> won't cause a problem to my further analysis.
>
>     +------------+
>
>     | i   t    r |
>
>     |------------|
>
>  1. | 1   1   42 |
>
>  2. | 1   2   42 |
>
>  3. | 1   3   42 |
>
>  4. | 1   4   42 |
>
>     |------------|
>
>  5. | 2   1   42 |
>
>  6. | 2   2   42 |
>
>  7. | 2   3    . |
>
>  8. | 2   4   42 |
>
>     |------------|
>
>  9. | 3   1   . |
>
> 10. | 3   2   42 |
>
> 11. | 3   3   42 |
>
> 12. | 3   4   42 |
>
>     +------------+
>
>
> I would also like Stata to shift up so that once i==2 is dropped then
> i==3 would now take the place as i==2; would this be possible?
>
>
> Best regards,
> Lisa
>
> P.S. I now realised that I am receiving answers from the Nick Cox
> mentioned in many of the help file. Sorry, my question might seem so
> basic to you!
>
>
>
>
> On Mon, Jul 23, 2012 at 11:56 PM, Nick Cox <[email protected]> wrote:
>
> What you observe is nothing to do with whether data have been declared as
> panel data.
>
>
> Consider this (in which no use of made of -tsset- or -xtset-)
>
>
> . clear
>
>
> . set obs 12
>
> obs was 0, now 12
>
>
> . egen i = seq(), block(4)
>
>
> . egen t = seq(), to(4)
>
>
> . gen r = cond(_n == 7, ., 42)
>
> (1 missing value generated)
>
>
> . l, sep(4)
>
>
>     +------------+
>
>     | i   t    r |
>
>     |------------|
>
>  1. | 1   1   42 |
>
>  2. | 1   2   42 |
>
>  3. | 1   3   42 |
>
>  4. | 1   4   42 |
>
>     |------------|
>
>  5. | 2   1   42 |
>
>  6. | 2   2   42 |
>
>  7. | 2   3    . |
>
>  8. | 2   4   42 |
>
>     |------------|
>
>  9. | 3   1   42 |
>
> 10. | 3   2   42 |
>
> 11. | 3   3   42 |
>
> 12. | 3   4   42 |
>
>     +------------+
>
>
> . bysort i (r) : drop if missing(r[_N])
>
> (4 observations deleted)
>
>
> . sort i t
>
>
> . l, sep(4)
>
>
>     +------------+
>
>     | i   t    r |
>
>     |------------|
>
>  1. | 1   1   42 |
>
>  2. | 1   2   42 |
>
>  3. | 1   3   42 |
>
>  4. | 1   4   42 |
>
>     |------------|
>
>  5. | 3   1   42 |
>
>  6. | 3   2   42 |
>
>  7. | 3   3   42 |
>
>  8. | 3   4   42 |
>
>     +------------+
>
>
> This code -drop-s entire panels if and only if there are any missing values
> in a panel. Isn't that what you want?
>
>
> It may be that you should also look here:
>
>
> FAQ     . . . . . . . . . . . . . . . . . .  Dropping spells of missing
> values
>
>        . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox and G.
> Longton
>
>        3/07    How can I drop spells of missing values at the
>
>                beginning and end of panel data?
>
>                http://www.stata.com/support/faqs/data/dropmiss.html
>
>
> FAQ     . . . . . . Identifying runs of consecutive observations in panel
> data
>
>        . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox and V.
> Wiggins
>
>        8/05    How do I identify runs of consecutive observations
>
>                in panel data?
>
>                http://www.stata.com/support/faqs/data/panel.html
>
>
> Please don't send me any datafiles, if only because I am travelling over the
> next few weeks.
>
>
> Nick
>
> [email protected]
>
>
> Lisa Wang
>
>
> Hi Nick,
>
>
> Thank you for looking into my problem. Much appreciated.
>
>
> I tried your suggestion of using -bysort i (r) : drop if
>
> missing(r[_N])- but it dropped all my observations, so I am left with zero
> observations now after that line is run by Stata.
>
>
> I don't know why; could it be that it's not declared as panel data but I do
> have -xtset i t - at the start already.
>
>
> Would you mind if I send you a partial part of my data set and do-file to
> have a look? I would like to know how to solve this as dropping observations
> in panel data I am sure I will need to do it on another project.
>
>
> Many thanks,
>
> Lisa
>
>
>
>
> On Mon, Jul 23, 2012 at 6:13 PM, Nick Cox <[email protected]> wrote:
>
> I still have the same understanding of your problem and am at a loss
>
> to see why my previous suggestions don't "seem to work".
>
>
> As I understand it, you have panel data with
>
>
> identifier i
>
> time variable t
>
> "rate"  r
>
>
> and the problem is that -r- is missing (equal to system missing .) in
>
> some observations and  you want to drop _entire panels_ if any
>
> observation in a panel has a missing value on -r-. A variant on my
>
> previous suggestions is
>
>
> bysort i (r) : drop if missing(r[_N])
>
>
> Nick
>
>
> On Mon, Jul 23, 2012 at 2:46 AM, Lisa Wang <[email protected]> wrote:
>
> Hi all,
>
>
> Both codes don't seem to drop any observations at all or drop all the
>
> observations.
>
>
> @Nick - I also tried yours but likewise, it doesn't seem to work
>
> either. I need to summarise the data based on i as this represents
>
> each individual entity - each entity will have multiple r's (of
>
> differing amounts). t is just a variable created to do a timeline
>
> kind of thing (eg. -694, -693..,-1,0, +1...+2093 for instance) and
>
> the days in the timeline can vary for each individual entity.
>
>
> If this is any help:
>
>
> After I run this code - tabulate i t if window==1 & r==. - I get this
>
> output from Stata:
>
>
>
>       |          Event Timeline
>
>         i |        -1          0          1 |     Total
>
> -----------+---------------------------------+----------
>
>      Amy1 |         0          0          1 |         1
>
>      Colin1 |         1          1          1 |         3
>
>      Chris1 |         0          0          1 |         1
>
>      Cat2 |         0          1          1 |         2
>
>      Ian1 |         1          1          0 |         2
>
>      Queenie1 |         1          1          1 |         3
>
>      Sam1 |         0          1          1 |         2
>
>      Uncle1 |         1          1          0 |         2
>
> -----------+---------------------------------+----------
>
>     Total |         4          6          6 |        16
>
>
>
> . levelsof i if window==1 & r==., local(entities)
>
> 2 4 6 7 9 14 21 25
>
>
> (eg. Amy1 is the second entity in my dataset, I want to remove ALL
>
> observations of Amy1 - not only the days (t) that I have missing
>
> observations as I want to omit these people from any further
>
> analysis).
>
>
>
> I also want i to be 22 (since 30 - 8 entities I want dropped from my
>
> dataset) as I will do some loops for regressions later on.
>
>
> Thank you everyone for your kind help so far.
>
>
> Kind regards,
>
> Lisa
>
>
> On Mon, Jul 23, 2012 at 9:40 AM, Nick Cox <[email protected]> wrote:
>
> Djalal's code can be simplified to
>
>
> drop if t==.
>
>
> as whether t is missing does not depend on its relation to other
>
> variables. So, it drops observations which are missing on -t-, which
>
> is not your problem.
>
>
> However, Lisa overlooks my earlier posting
>
>
> http://www.stata.com/statalist/archive/2012-07/msg00776.html
>
>
> I got a bit lost in Lisa's explanation (for example further
>
> variables
>
> -twindow- and -holidaywindow- appear without any explanation) but my
>
> solution should still be relevant. Another solution might be
>
>
> bysort i (window) : drop if window[_N] == 1
>
>
> Nick
>
>
> On Sun, Jul 22, 2012 at 10:46 PM, Lisa Wang <[email protected]> wrote:
>
> Hi Djala,
>
>
> Thank you for your help.
>
>
> I have tried your recommendation but it does not delete any
>
> observations from my data set at all.
>
>
> Maybe I didn't specify my query well enough. If there are missing
>
> observations within a particular period, which is denoted by a
>
> dummy variable 'window', then drop ALL the observations pertaining
>
> to that person - not only the rows that have missing observations.
>
>
> Would you have any other suggestions?
>
>
> Kind regards,
>
> Lisa
>
>
>
>
> On Mon, Jul 23, 2012 at 1:11 AM, Arinloye Djalal <[email protected]> wrote:
>
> Hi Lisa,
>
> Have you tried the following syntax?
>
>
> by i, sort : drop if t==.
>
>
> This will allow you have t variable without any missing observation.
>
> As you have already distinguished wish people/rows are concerned
>
> you can manually drop them from data editor.
>
>
> Hope this can hope.
>
>
>
> Djalal Arinloye
>
>
>
> -----Message d'origine-----
>
> De : [email protected]
>
> [mailto:[email protected]] De la part de Lisa
>
> Wang Envoyé : Sunday, July 22, 2012 12:51 PM À :
>
> [email protected] Objet : st: drop variables in panel
>
> data with loop
>
>
> I am having trouble with Stata and would like some guidance on
>
> what I am doing incorrectly. I am new to Stata (only 1 month into
>
> it), so I am still trying to learn and sometimes still thinking like in
> Excel.
>
>
> I will try to be as detailed as possible, so you can understand my question.
>
>
> To describe my data set, I have some panel data and a variable i,
>
> which is the names (eg. Mary, Tom...) but encoded into a numeric
>
> as
>
> such: - encode symbol1, generate (i) -. There are 59732 rows and
>
> the count of i is 30.
>
>
> What I would like to achieve is to tell the program to drop the
>
> observations that have missing values for a variable for a
>
> specific period (variable window). E.g. If there is no data for
>
> "Mary" for day
>
> 102 then drop all the rows pertaining to "Mary"  from day 1...T -
>
> not only drop the the observation for Mary on day 102.
>
>
> This is my code to try to achieve this:
>
>
> version 12.1
>
> clear all
>
> set more off
>
>
> cd "C:\Users\Admin\Desktop"
>
>
> use window_students, clear
>
>
> xtset i t
>
> //check panel structure is correct
>
>
>
> summ i   // this tells me that the max of variable i is 30, which is
>
> correct as I have 30 people I need to analyse
>
>
> tabulate i t if window==1 & r==.
>
>  //r is another variable stored in another column, which
>
> represents their rates. There are 8 people that don't have any
>
> rates within my window.
>
> ///I would like to remove all the observations pertaining to these
>
> peopl
>
>
> levelsof i if window==1 & r==., local(entities)        //tried to
>
> store the people that were missing into a local macro - these are
>
> i =
>
> 2 4 6 7 9 14 21 25
>
>
>
>
> Then I tried this:
>
>
> *Method 1 - but then results window has return code 198 and
>
> invalid '4' in red text
>
>
> foreach i of local entities{
>
> drop if i==`entities'
>
> }
>
>
>
> *Method 2 - but then results window has return code 111 and
>
> variable i not found
>
>
> foreach i of local entities{
>
> drop i
>
> }
>
>
> *Method 3 - but it deleted all of my observations
>
>
> foreach i of local entities{
>
> drop i
>
> }
>
>
> *Method 4 - after Stata told me that it was person 2,4, 6, 7, 9 etc...
>
> that were missing observations I wrote out each line
>
>
> drop if i==2
>
> drop if i==4   //etc.....
>
>
> summ i            // I still get 30 in the summary but it has told me
>
> that it has deleted observations for each drop if line that I
>
> used....shouldn't it be 22 now after I removed the 8 people?
>
>
>
>
> I am stuck now...as I need the i to be correct as I will be doing
>
> some regressions with the i later, that's why I have to drop the
>
> people that don't have observations in my dataset before I do
>
> further analysis.
>
>
> eg.
>
> summarize i
>
> local m = r(max)
>
> //create a local macro storing the max number of distinct entities
>
> from an r-scalar
>
>
> generate ar = .
>
>
>
>
>        forvalues x = 1/`m' {
>
> //run regression for every entity in data set
>
>                regress r ind if i==`x' & twindow
>
>
>                predict res if i==`x', residuals //predict
>
> residuals both in-sample and out-of-sample
>
>                replace ar=res if i==`x' & holidaywindow //replace
>
> ar=. with thes estimated residuals
>
>                drop res
>
> }
>
>
> *
>
> *   For searches and help try:
>
> *   http://www.stata.com/help.cgi?search
>
> *   http://www.stata.com/support/statalist/faq
>
> *   http://www.ats.ucla.edu/stat/stata/
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/