Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: RE: drop variables in panel data with loop


From   Nick Cox <[email protected]>
To   "'[email protected]'" <[email protected]>
Subject   RE: st: RE: drop variables in panel data with loop
Date   Mon, 23 Jul 2012 14:56:52 +0100

What you observe is nothing to do with whether data have been declared as panel data. 

Consider this (in which no use of made of -tsset- or -xtset-)  

. clear

. set obs 12
obs was 0, now 12

. egen i = seq(), block(4)

. egen t = seq(), to(4)

. gen r = cond(_n == 7, ., 42)
(1 missing value generated)

. l, sep(4) 

     +------------+
     | i   t    r |
     |------------|
  1. | 1   1   42 |
  2. | 1   2   42 |
  3. | 1   3   42 |
  4. | 1   4   42 |
     |------------|
  5. | 2   1   42 |
  6. | 2   2   42 |
  7. | 2   3    . |
  8. | 2   4   42 |
     |------------|
  9. | 3   1   42 |
 10. | 3   2   42 |
 11. | 3   3   42 |
 12. | 3   4   42 |
     +------------+

. bysort i (r) : drop if missing(r[_N])
(4 observations deleted)

. sort i t

. l, sep(4)

     +------------+
     | i   t    r |
     |------------|
  1. | 1   1   42 |
  2. | 1   2   42 |
  3. | 1   3   42 |
  4. | 1   4   42 |
     |------------|
  5. | 3   1   42 |
  6. | 3   2   42 |
  7. | 3   3   42 |
  8. | 3   4   42 |
     +------------+

This code -drop-s entire panels if and only if there are any missing values in a panel. Isn't that what you want? 

It may be that you should also look here: 

FAQ     . . . . . . . . . . . . . . . . . .  Dropping spells of missing values
        . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox and G. Longton
        3/07    How can I drop spells of missing values at the
                beginning and end of panel data?
                http://www.stata.com/support/faqs/data/dropmiss.html

FAQ     . . . . . . Identifying runs of consecutive observations in panel data
        . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox and V. Wiggins
        8/05    How do I identify runs of consecutive observations
                in panel data?
                http://www.stata.com/support/faqs/data/panel.html

Please don't send me any datafiles, if only because I am travelling over the next few weeks. 

Nick 
[email protected] 

Lisa Wang

Hi Nick,

Thank you for looking into my problem. Much appreciated.

I tried your suggestion of using -bysort i (r) : drop if
missing(r[_N])- but it dropped all my observations, so I am left with zero observations now after that line is run by Stata.

I don't know why; could it be that it's not declared as panel data but I do have -xtset i t - at the start already.

Would you mind if I send you a partial part of my data set and do-file to have a look? I would like to know how to solve this as dropping observations in panel data I am sure I will need to do it on another project.

Many thanks,
Lisa



On Mon, Jul 23, 2012 at 6:13 PM, Nick Cox <[email protected]> wrote:
> I still have the same understanding of your problem and am at a loss 
> to see why my previous suggestions don't "seem to work".
>
> As I understand it, you have panel data with
>
> identifier i
> time variable t
> "rate"  r
>
> and the problem is that -r- is missing (equal to system missing .) in 
> some observations and  you want to drop _entire panels_ if any 
> observation in a panel has a missing value on -r-. A variant on my 
> previous suggestions is
>
> bysort i (r) : drop if missing(r[_N])
>
> Nick
>
> On Mon, Jul 23, 2012 at 2:46 AM, Lisa Wang <[email protected]> wrote:
>> Hi all,
>>
>> Both codes don't seem to drop any observations at all or drop all the 
>> observations.
>>
>> @Nick - I also tried yours but likewise, it doesn't seem to work 
>> either. I need to summarise the data based on i as this represents 
>> each individual entity - each entity will have multiple r's (of 
>> differing amounts). t is just a variable created to do a timeline 
>> kind of thing (eg. -694, -693..,-1,0, +1...+2093 for instance) and 
>> the days in the timeline can vary for each individual entity.
>>
>> If this is any help:
>>
>> After I run this code - tabulate i t if window==1 & r==. - I get this 
>> output from Stata:
>>
>>
>>        |          Event Timeline
>>          i |        -1          0          1 |     Total
>> -----------+---------------------------------+----------
>>       Amy1 |         0          0          1 |         1
>>       Colin1 |         1          1          1 |         3
>>       Chris1 |         0          0          1 |         1
>>       Cat2 |         0          1          1 |         2
>>       Ian1 |         1          1          0 |         2
>>       Queenie1 |         1          1          1 |         3
>>       Sam1 |         0          1          1 |         2
>>       Uncle1 |         1          1          0 |         2
>> -----------+---------------------------------+----------
>>      Total |         4          6          6 |        16
>>
>>
>> . levelsof i if window==1 & r==., local(entities)
>> 2 4 6 7 9 14 21 25
>>
>> (eg. Amy1 is the second entity in my dataset, I want to remove ALL 
>> observations of Amy1 - not only the days (t) that I have missing 
>> observations as I want to omit these people from any further 
>> analysis).
>>
>>
>> I also want i to be 22 (since 30 - 8 entities I want dropped from my
>> dataset) as I will do some loops for regressions later on.
>>
>> Thank you everyone for your kind help so far.
>>
>> Kind regards,
>> Lisa
>>
>> On Mon, Jul 23, 2012 at 9:40 AM, Nick Cox <[email protected]> wrote:
>>> Djalal's code can be simplified to
>>>
>>> drop if t==.
>>>
>>> as whether t is missing does not depend on its relation to other 
>>> variables. So, it drops observations which are missing on -t-, which 
>>> is not your problem.
>>>
>>> However, Lisa overlooks my earlier posting
>>>
>>> http://www.stata.com/statalist/archive/2012-07/msg00776.html
>>>
>>> I got a bit lost in Lisa's explanation (for example further 
>>> variables
>>> -twindow- and -holidaywindow- appear without any explanation) but my 
>>> solution should still be relevant. Another solution might be
>>>
>>> bysort i (window) : drop if window[_N] == 1
>>>
>>> Nick
>>>
>>> On Sun, Jul 22, 2012 at 10:46 PM, Lisa Wang <[email protected]> wrote:
>>>> Hi Djala,
>>>>
>>>> Thank you for your help.
>>>>
>>>> I have tried your recommendation but it does not delete any 
>>>> observations from my data set at all.
>>>>
>>>> Maybe I didn't specify my query well enough. If there are missing 
>>>> observations within a particular period, which is denoted by a 
>>>> dummy variable 'window', then drop ALL the observations pertaining 
>>>> to that person - not only the rows that have missing observations.
>>>>
>>>> Would you have any other suggestions?
>>>>
>>>> Kind regards,
>>>> Lisa
>>>>
>>>>
>>>>
>>>> On Mon, Jul 23, 2012 at 1:11 AM, Arinloye Djalal <[email protected]> wrote:
>>>>> Hi Lisa,
>>>>> Have you tried the following syntax?
>>>>>
>>>>> by i, sort : drop if t==.
>>>>>
>>>>> This will allow you have t variable without any missing observation.
>>>>> As you have already distinguished wish people/rows are concerned 
>>>>> you can manually drop them from data editor.
>>>>>
>>>>> Hope this can hope.
>>>>>
>>>>>
>>>>> Djalal Arinloye
>>>>>
>>>>>
>>>>> -----Message d'origine-----
>>>>> De : [email protected]
>>>>> [mailto:[email protected]] De la part de Lisa 
>>>>> Wang Envoyé : Sunday, July 22, 2012 12:51 PM À : 
>>>>> [email protected] Objet : st: drop variables in panel 
>>>>> data with loop
>>>>>
>>>>> I am having trouble with Stata and would like some guidance on 
>>>>> what I am doing incorrectly. I am new to Stata (only 1 month into 
>>>>> it), so I am still trying to learn and sometimes still thinking like in Excel.
>>>>>
>>>>> I will try to be as detailed as possible, so you can understand my question.
>>>>>
>>>>> To describe my data set, I have some panel data and a variable i, 
>>>>> which is the names (eg. Mary, Tom...) but encoded into a numeric 
>>>>> as
>>>>> such: - encode symbol1, generate (i) -. There are 59732 rows and 
>>>>> the count of i is 30.
>>>>>
>>>>> What I would like to achieve is to tell the program to drop the 
>>>>> observations that have missing values for a variable for a 
>>>>> specific period (variable window). E.g. If there is no data for 
>>>>> "Mary" for day
>>>>> 102 then drop all the rows pertaining to "Mary"  from day 1...T - 
>>>>> not only drop the the observation for Mary on day 102.
>>>>>
>>>>> This is my code to try to achieve this:
>>>>>
>>>>> version 12.1
>>>>> clear all
>>>>> set more off
>>>>>
>>>>> cd "C:\Users\Admin\Desktop"
>>>>>
>>>>> use window_students, clear
>>>>>
>>>>> xtset i t
>>>>> //check panel structure is correct
>>>>>
>>>>>
>>>>> summ i   // this tells me that the max of variable i is 30, which is
>>>>> correct as I have 30 people I need to analyse
>>>>>
>>>>> tabulate i t if window==1 & r==.
>>>>>   //r is another variable stored in another column, which 
>>>>> represents their rates. There are 8 people that don't have any 
>>>>> rates within my window.
>>>>> ///I would like to remove all the observations pertaining to these 
>>>>> peopl
>>>>>
>>>>> levelsof i if window==1 & r==., local(entities)        //tried to
>>>>> store the people that were missing into a local macro - these are 
>>>>> i =
>>>>> 2 4 6 7 9 14 21 25
>>>>>
>>>>>
>>>>>
>>>>> Then I tried this:
>>>>>
>>>>> *Method 1 - but then results window has return code 198 and 
>>>>> invalid '4' in red text
>>>>>
>>>>> foreach i of local entities{
>>>>> drop if i==`entities'
>>>>> }
>>>>>
>>>>>
>>>>> *Method 2 - but then results window has return code 111 and 
>>>>> variable i not found
>>>>>
>>>>> foreach i of local entities{
>>>>> drop i
>>>>> }
>>>>>
>>>>> *Method 3 - but it deleted all of my observations
>>>>>
>>>>> foreach i of local entities{
>>>>> drop i
>>>>> }
>>>>>
>>>>> *Method 4 - after Stata told me that it was person 2,4, 6, 7, 9 etc...
>>>>> that were missing observations I wrote out each line
>>>>>
>>>>> drop if i==2
>>>>> drop if i==4   //etc.....
>>>>>
>>>>> summ i            // I still get 30 in the summary but it has told me
>>>>> that it has deleted observations for each drop if line that I 
>>>>> used....shouldn't it be 22 now after I removed the 8 people?
>>>>>
>>>>>
>>>>>
>>>>> I am stuck now...as I need the i to be correct as I will be doing 
>>>>> some regressions with the i later, that's why I have to drop the 
>>>>> people that don't have observations in my dataset before I do 
>>>>> further analysis.
>>>>>
>>>>> eg.
>>>>> summarize i
>>>>> local m = r(max)
>>>>> //create a local macro storing the max number of distinct entities 
>>>>> from an r-scalar
>>>>>
>>>>> generate ar = .
>>>>>
>>>>>
>>>>>
>>>>>         forvalues x = 1/`m' {
>>>>> //run regression for every entity in data set
>>>>>                 regress r ind if i==`x' & twindow
>>>>>
>>>>>                 predict res if i==`x', residuals //predict 
>>>>> residuals both in-sample and out-of-sample
>>>>>                 replace ar=res if i==`x' & holidaywindow //replace 
>>>>> ar=. with thes estimated residuals
>>>>>                 drop res
>>>>> }

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index