Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: drop variables in panel data with loop


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: RE: drop variables in panel data with loop
Date   Mon, 23 Jul 2012 09:13:03 +0100

I still have the same understanding of your problem and am at a loss
to see why my previous suggestions don't "seem to work".

As I understand it, you have panel data with

identifier i
time variable t
"rate"  r

and the problem is that -r- is missing (equal to system missing .) in
some observations and  you want to drop _entire panels_ if any
observation in a panel has a missing value on -r-. A variant on my
previous suggestions is

bysort i (r) : drop if missing(r[_N])

Nick

On Mon, Jul 23, 2012 at 2:46 AM, Lisa Wang <lhwang0925@gmail.com> wrote:
> Hi all,
>
> Both codes don't seem to drop any observations at all or drop all the
> observations.
>
> @Nick - I also tried yours but likewise, it doesn't seem to work
> either. I need to summarise the data based on i as this represents
> each individual entity - each entity will have multiple r's (of
> differing amounts). t is just a variable created to do a timeline kind
> of thing (eg. -694, -693..,-1,0, +1...+2093 for instance) and the days
> in the timeline can vary for each individual entity.
>
> If this is any help:
>
> After I run this code - tabulate i t if window==1 & r==. - I get this
> output from Stata:
>
>
>        |          Event Timeline
>          i |        -1          0          1 |     Total
> -----------+---------------------------------+----------
>       Amy1 |         0          0          1 |         1
>       Colin1 |         1          1          1 |         3
>       Chris1 |         0          0          1 |         1
>       Cat2 |         0          1          1 |         2
>       Ian1 |         1          1          0 |         2
>       Queenie1 |         1          1          1 |         3
>       Sam1 |         0          1          1 |         2
>       Uncle1 |         1          1          0 |         2
> -----------+---------------------------------+----------
>      Total |         4          6          6 |        16
>
>
> . levelsof i if window==1 & r==., local(entities)
> 2 4 6 7 9 14 21 25
>
> (eg. Amy1 is the second entity in my dataset, I want to remove ALL
> observations of Amy1 - not only the days (t) that I have missing
> observations as I want to omit these people from any further
> analysis).
>
>
> I also want i to be 22 (since 30 - 8 entities I want dropped from my
> dataset) as I will do some loops for regressions later on.
>
> Thank you everyone for your kind help so far.
>
> Kind regards,
> Lisa
>
> On Mon, Jul 23, 2012 at 9:40 AM, Nick Cox <njcoxstata@gmail.com> wrote:
>> Djalal's code can be simplified to
>>
>> drop if t==.
>>
>> as whether t is missing does not depend on its relation to other
>> variables. So, it drops observations which are missing on -t-, which
>> is not your problem.
>>
>> However, Lisa overlooks my earlier posting
>>
>> http://www.stata.com/statalist/archive/2012-07/msg00776.html
>>
>> I got a bit lost in Lisa's explanation (for example further variables
>> -twindow- and -holidaywindow- appear without any explanation) but my
>> solution should still be relevant. Another solution might be
>>
>> bysort i (window) : drop if window[_N] == 1
>>
>> Nick
>>
>> On Sun, Jul 22, 2012 at 10:46 PM, Lisa Wang <lhwang0925@gmail.com> wrote:
>>> Hi Djala,
>>>
>>> Thank you for your help.
>>>
>>> I have tried your recommendation but it does not delete any
>>> observations from my data set at all.
>>>
>>> Maybe I didn't specify my query well enough. If there are missing
>>> observations within a particular period, which is denoted by a dummy
>>> variable 'window', then drop ALL the observations pertaining to that
>>> person - not only the rows that have missing observations.
>>>
>>> Would you have any other suggestions?
>>>
>>> Kind regards,
>>> Lisa
>>>
>>>
>>>
>>> On Mon, Jul 23, 2012 at 1:11 AM, Arinloye Djalal <aridjal@gmail.com> wrote:
>>>> Hi Lisa,
>>>> Have you tried the following syntax?
>>>>
>>>> by i, sort : drop if t==.
>>>>
>>>> This will allow you have t variable without any missing observation.
>>>> As you have already distinguished wish people/rows are concerned you can
>>>> manually drop them from data editor.
>>>>
>>>> Hope this can hope.
>>>>
>>>>
>>>> Djalal Arinloye
>>>>
>>>>
>>>> -----Message d'origine-----
>>>> De : owner-statalist@hsphsun2.harvard.edu
>>>> [mailto:owner-statalist@hsphsun2.harvard.edu] De la part de Lisa Wang
>>>> Envoyé : Sunday, July 22, 2012 12:51 PM
>>>> À : statalist@hsphsun2.harvard.edu
>>>> Objet : st: drop variables in panel data with loop
>>>>
>>>> I am having trouble with Stata and would like some guidance on what I
>>>> am doing incorrectly. I am new to Stata (only 1 month into it), so I
>>>> am still trying to learn and sometimes still thinking like in Excel.
>>>>
>>>> I will try to be as detailed as possible, so you can understand my question.
>>>>
>>>> To describe my data set, I have some panel data and a variable i,
>>>> which is the names (eg. Mary, Tom...) but encoded into a numeric as
>>>> such: - encode symbol1, generate (i) -. There are 59732 rows and the
>>>> count of i is 30.
>>>>
>>>> What I would like to achieve is to tell the program to drop the
>>>> observations that have missing values for a variable for a specific
>>>> period (variable window). E.g. If there is no data for "Mary" for day
>>>> 102 then drop all the rows pertaining to "Mary"  from day 1...T - not
>>>> only drop the the observation for Mary on day 102.
>>>>
>>>> This is my code to try to achieve this:
>>>>
>>>> version 12.1
>>>> clear all
>>>> set more off
>>>>
>>>> cd "C:\Users\Admin\Desktop"
>>>>
>>>> use window_students, clear
>>>>
>>>> xtset i t
>>>> //check panel structure is correct
>>>>
>>>>
>>>> summ i   // this tells me that the max of variable i is 30, which is
>>>> correct as I have 30 people I need to analyse
>>>>
>>>> tabulate i t if window==1 & r==.
>>>>   //r is another variable stored in another column, which represents
>>>> their rates. There are 8 people that don't have any rates within my
>>>> window.
>>>> ///I would like to remove all the observations pertaining to these peopl
>>>>
>>>> levelsof i if window==1 & r==., local(entities)        //tried to
>>>> store the people that were missing into a local macro - these are i =
>>>> 2 4 6 7 9 14 21 25
>>>>
>>>>
>>>>
>>>> Then I tried this:
>>>>
>>>> *Method 1 - but then results window has return code 198 and invalid
>>>> '4' in red text
>>>>
>>>> foreach i of local entities{
>>>> drop if i==`entities'
>>>> }
>>>>
>>>>
>>>> *Method 2 - but then results window has return code 111 and variable i not
>>>> found
>>>>
>>>> foreach i of local entities{
>>>> drop i
>>>> }
>>>>
>>>> *Method 3 - but it deleted all of my observations
>>>>
>>>> foreach i of local entities{
>>>> drop i
>>>> }
>>>>
>>>> *Method 4 - after Stata told me that it was person 2,4, 6, 7, 9 etc...
>>>> that were missing observations I wrote out each line
>>>>
>>>> drop if i==2
>>>> drop if i==4   //etc.....
>>>>
>>>> summ i            // I still get 30 in the summary but it has told me
>>>> that it has deleted observations for each drop if line that I
>>>> used....shouldn't it be 22 now after I removed the 8 people?
>>>>
>>>>
>>>>
>>>> I am stuck now...as I need the i to be correct as I will be doing some
>>>> regressions with the i later, that's why I have to drop the people
>>>> that don't have observations in my dataset before I do further
>>>> analysis.
>>>>
>>>> eg.
>>>> summarize i
>>>> local m = r(max)
>>>> //create a local macro storing the max
>>>> number of distinct entities from an r-scalar
>>>>
>>>> generate ar = .
>>>>
>>>>
>>>>
>>>>         forvalues x = 1/`m' {
>>>> //run regression for every entity in data set
>>>>                 regress r ind if i==`x' & twindow
>>>>
>>>>                 predict res if i==`x', residuals
>>>> //predict residuals both
>>>> in-sample and out-of-sample
>>>>                 replace ar=res if i==`x' & holidaywindow
>>>> //replace ar=. with thes
>>>> estimated residuals
>>>>                 drop res
>>>> }
>>>>
>>>>
>>>>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index