Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Lisa Wang <[email protected]> |

To |
[email protected] |

Subject |
Re: st: RE: drop variables in panel data with loop |

Date |
Tue, 24 Jul 2012 08:26:11 +1000 |

```
Thank you for your example. That made is so much clearer! I should
have done something similar at the start. Now, I know how to word my
question better next time. Thank you.
I now understand why the code you kindly suggested to me may have
dropped all of my observations.
For each i, I will definitely have missing r's somewhere in the panel
for each i, so Stata recognises this and drops everything for me.
Using your example below (with a modified observation 9 to be missing
as well), then both i==2 and 3 would be dropped. Let's say, however, I
only want Stata to drop the the panel only if there is/are missing r's
between t =-3 to 4 (i.e. i==2 would all be dropped but i==3 would
remain in my dataset). I don't want i==3 to be dropped though as that
won't cause a problem to my further analysis.
> +------------+
> | i t r |
> |------------|
> 1. | 1 1 42 |
> 2. | 1 2 42 |
> 3. | 1 3 42 |
> 4. | 1 4 42 |
> |------------|
> 5. | 2 1 42 |
> 6. | 2 2 42 |
> 7. | 2 3 . |
> 8. | 2 4 42 |
> |------------|
> 9. | 3 1 . |
> 10. | 3 2 42 |
> 11. | 3 3 42 |
> 12. | 3 4 42 |
> +------------+
I would also like Stata to shift up so that once i==2 is dropped then
i==3 would now take the place as i==2; would this be possible?
Best regards,
Lisa
P.S. I now realised that I am receiving answers from the Nick Cox
mentioned in many of the help file. Sorry, my question might seem so
basic to you!
On Mon, Jul 23, 2012 at 11:56 PM, Nick Cox <[email protected]> wrote:
> What you observe is nothing to do with whether data have been declared as panel data.
>
> Consider this (in which no use of made of -tsset- or -xtset-)
>
> . clear
>
> . set obs 12
> obs was 0, now 12
>
> . egen i = seq(), block(4)
>
> . egen t = seq(), to(4)
>
> . gen r = cond(_n == 7, ., 42)
> (1 missing value generated)
>
> . l, sep(4)
>
> +------------+
> | i t r |
> |------------|
> 1. | 1 1 42 |
> 2. | 1 2 42 |
> 3. | 1 3 42 |
> 4. | 1 4 42 |
> |------------|
> 5. | 2 1 42 |
> 6. | 2 2 42 |
> 7. | 2 3 . |
> 8. | 2 4 42 |
> |------------|
> 9. | 3 1 42 |
> 10. | 3 2 42 |
> 11. | 3 3 42 |
> 12. | 3 4 42 |
> +------------+
>
> . bysort i (r) : drop if missing(r[_N])
> (4 observations deleted)
>
> . sort i t
>
> . l, sep(4)
>
> +------------+
> | i t r |
> |------------|
> 1. | 1 1 42 |
> 2. | 1 2 42 |
> 3. | 1 3 42 |
> 4. | 1 4 42 |
> |------------|
> 5. | 3 1 42 |
> 6. | 3 2 42 |
> 7. | 3 3 42 |
> 8. | 3 4 42 |
> +------------+
>
> This code -drop-s entire panels if and only if there are any missing values in a panel. Isn't that what you want?
>
> It may be that you should also look here:
>
> FAQ . . . . . . . . . . . . . . . . . . Dropping spells of missing values
> . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox and G. Longton
> 3/07 How can I drop spells of missing values at the
> beginning and end of panel data?
> http://www.stata.com/support/faqs/data/dropmiss.html
>
> FAQ . . . . . . Identifying runs of consecutive observations in panel data
> . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox and V. Wiggins
> 8/05 How do I identify runs of consecutive observations
> in panel data?
> http://www.stata.com/support/faqs/data/panel.html
>
> Please don't send me any datafiles, if only because I am travelling over the next few weeks.
>
> Nick
> [email protected]
>
> Lisa Wang
>
> Hi Nick,
>
> Thank you for looking into my problem. Much appreciated.
>
> I tried your suggestion of using -bysort i (r) : drop if
> missing(r[_N])- but it dropped all my observations, so I am left with zero observations now after that line is run by Stata.
>
> I don't know why; could it be that it's not declared as panel data but I do have -xtset i t - at the start already.
>
> Would you mind if I send you a partial part of my data set and do-file to have a look? I would like to know how to solve this as dropping observations in panel data I am sure I will need to do it on another project.
>
> Many thanks,
> Lisa
>
>
>
> On Mon, Jul 23, 2012 at 6:13 PM, Nick Cox <[email protected]> wrote:
>> I still have the same understanding of your problem and am at a loss
>> to see why my previous suggestions don't "seem to work".
>>
>> As I understand it, you have panel data with
>>
>> identifier i
>> time variable t
>> "rate" r
>>
>> and the problem is that -r- is missing (equal to system missing .) in
>> some observations and you want to drop _entire panels_ if any
>> observation in a panel has a missing value on -r-. A variant on my
>> previous suggestions is
>>
>> bysort i (r) : drop if missing(r[_N])
>>
>> Nick
>>
>> On Mon, Jul 23, 2012 at 2:46 AM, Lisa Wang <[email protected]> wrote:
>>> Hi all,
>>>
>>> Both codes don't seem to drop any observations at all or drop all the
>>> observations.
>>>
>>> @Nick - I also tried yours but likewise, it doesn't seem to work
>>> either. I need to summarise the data based on i as this represents
>>> each individual entity - each entity will have multiple r's (of
>>> differing amounts). t is just a variable created to do a timeline
>>> kind of thing (eg. -694, -693..,-1,0, +1...+2093 for instance) and
>>> the days in the timeline can vary for each individual entity.
>>>
>>> If this is any help:
>>>
>>> After I run this code - tabulate i t if window==1 & r==. - I get this
>>> output from Stata:
>>>
>>>
>>> | Event Timeline
>>> i | -1 0 1 | Total
>>> -----------+---------------------------------+----------
>>> Amy1 | 0 0 1 | 1
>>> Colin1 | 1 1 1 | 3
>>> Chris1 | 0 0 1 | 1
>>> Cat2 | 0 1 1 | 2
>>> Ian1 | 1 1 0 | 2
>>> Queenie1 | 1 1 1 | 3
>>> Sam1 | 0 1 1 | 2
>>> Uncle1 | 1 1 0 | 2
>>> -----------+---------------------------------+----------
>>> Total | 4 6 6 | 16
>>>
>>>
>>> . levelsof i if window==1 & r==., local(entities)
>>> 2 4 6 7 9 14 21 25
>>>
>>> (eg. Amy1 is the second entity in my dataset, I want to remove ALL
>>> observations of Amy1 - not only the days (t) that I have missing
>>> observations as I want to omit these people from any further
>>> analysis).
>>>
>>>
>>> I also want i to be 22 (since 30 - 8 entities I want dropped from my
>>> dataset) as I will do some loops for regressions later on.
>>>
>>> Thank you everyone for your kind help so far.
>>>
>>> Kind regards,
>>> Lisa
>>>
>>> On Mon, Jul 23, 2012 at 9:40 AM, Nick Cox <[email protected]> wrote:
>>>> Djalal's code can be simplified to
>>>>
>>>> drop if t==.
>>>>
>>>> as whether t is missing does not depend on its relation to other
>>>> variables. So, it drops observations which are missing on -t-, which
>>>> is not your problem.
>>>>
>>>> However, Lisa overlooks my earlier posting
>>>>
>>>> http://www.stata.com/statalist/archive/2012-07/msg00776.html
>>>>
>>>> I got a bit lost in Lisa's explanation (for example further
>>>> variables
>>>> -twindow- and -holidaywindow- appear without any explanation) but my
>>>> solution should still be relevant. Another solution might be
>>>>
>>>> bysort i (window) : drop if window[_N] == 1
>>>>
>>>> Nick
>>>>
>>>> On Sun, Jul 22, 2012 at 10:46 PM, Lisa Wang <[email protected]> wrote:
>>>>> Hi Djala,
>>>>>
>>>>> Thank you for your help.
>>>>>
>>>>> I have tried your recommendation but it does not delete any
>>>>> observations from my data set at all.
>>>>>
>>>>> Maybe I didn't specify my query well enough. If there are missing
>>>>> observations within a particular period, which is denoted by a
>>>>> dummy variable 'window', then drop ALL the observations pertaining
>>>>> to that person - not only the rows that have missing observations.
>>>>>
>>>>> Would you have any other suggestions?
>>>>>
>>>>> Kind regards,
>>>>> Lisa
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Jul 23, 2012 at 1:11 AM, Arinloye Djalal <[email protected]> wrote:
>>>>>> Hi Lisa,
>>>>>> Have you tried the following syntax?
>>>>>>
>>>>>> by i, sort : drop if t==.
>>>>>>
>>>>>> This will allow you have t variable without any missing observation.
>>>>>> As you have already distinguished wish people/rows are concerned
>>>>>> you can manually drop them from data editor.
>>>>>>
>>>>>> Hope this can hope.
>>>>>>
>>>>>>
>>>>>> Djalal Arinloye
>>>>>>
>>>>>>
>>>>>> -----Message d'origine-----
>>>>>> De : [email protected]
>>>>>> [mailto:[email protected]] De la part de Lisa
>>>>>> Wang Envoyé : Sunday, July 22, 2012 12:51 PM À :
>>>>>> [email protected] Objet : st: drop variables in panel
>>>>>> data with loop
>>>>>>
>>>>>> I am having trouble with Stata and would like some guidance on
>>>>>> what I am doing incorrectly. I am new to Stata (only 1 month into
>>>>>> it), so I am still trying to learn and sometimes still thinking like in Excel.
>>>>>>
>>>>>> I will try to be as detailed as possible, so you can understand my question.
>>>>>>
>>>>>> To describe my data set, I have some panel data and a variable i,
>>>>>> which is the names (eg. Mary, Tom...) but encoded into a numeric
>>>>>> as
>>>>>> such: - encode symbol1, generate (i) -. There are 59732 rows and
>>>>>> the count of i is 30.
>>>>>>
>>>>>> What I would like to achieve is to tell the program to drop the
>>>>>> observations that have missing values for a variable for a
>>>>>> specific period (variable window). E.g. If there is no data for
>>>>>> "Mary" for day
>>>>>> 102 then drop all the rows pertaining to "Mary" from day 1...T -
>>>>>> not only drop the the observation for Mary on day 102.
>>>>>>
>>>>>> This is my code to try to achieve this:
>>>>>>
>>>>>> version 12.1
>>>>>> clear all
>>>>>> set more off
>>>>>>
>>>>>> cd "C:\Users\Admin\Desktop"
>>>>>>
>>>>>> use window_students, clear
>>>>>>
>>>>>> xtset i t
>>>>>> //check panel structure is correct
>>>>>>
>>>>>>
>>>>>> summ i // this tells me that the max of variable i is 30, which is
>>>>>> correct as I have 30 people I need to analyse
>>>>>>
>>>>>> tabulate i t if window==1 & r==.
>>>>>> //r is another variable stored in another column, which
>>>>>> represents their rates. There are 8 people that don't have any
>>>>>> rates within my window.
>>>>>> ///I would like to remove all the observations pertaining to these
>>>>>> peopl
>>>>>>
>>>>>> levelsof i if window==1 & r==., local(entities) //tried to
>>>>>> store the people that were missing into a local macro - these are
>>>>>> i =
>>>>>> 2 4 6 7 9 14 21 25
>>>>>>
>>>>>>
>>>>>>
>>>>>> Then I tried this:
>>>>>>
>>>>>> *Method 1 - but then results window has return code 198 and
>>>>>> invalid '4' in red text
>>>>>>
>>>>>> foreach i of local entities{
>>>>>> drop if i==`entities'
>>>>>> }
>>>>>>
>>>>>>
>>>>>> *Method 2 - but then results window has return code 111 and
>>>>>> variable i not found
>>>>>>
>>>>>> foreach i of local entities{
>>>>>> drop i
>>>>>> }
>>>>>>
>>>>>> *Method 3 - but it deleted all of my observations
>>>>>>
>>>>>> foreach i of local entities{
>>>>>> drop i
>>>>>> }
>>>>>>
>>>>>> *Method 4 - after Stata told me that it was person 2,4, 6, 7, 9 etc...
>>>>>> that were missing observations I wrote out each line
>>>>>>
>>>>>> drop if i==2
>>>>>> drop if i==4 //etc.....
>>>>>>
>>>>>> summ i // I still get 30 in the summary but it has told me
>>>>>> that it has deleted observations for each drop if line that I
>>>>>> used....shouldn't it be 22 now after I removed the 8 people?
>>>>>>
>>>>>>
>>>>>>
>>>>>> I am stuck now...as I need the i to be correct as I will be doing
>>>>>> some regressions with the i later, that's why I have to drop the
>>>>>> people that don't have observations in my dataset before I do
>>>>>> further analysis.
>>>>>>
>>>>>> eg.
>>>>>> summarize i
>>>>>> local m = r(max)
>>>>>> //create a local macro storing the max number of distinct entities
>>>>>> from an r-scalar
>>>>>>
>>>>>> generate ar = .
>>>>>>
>>>>>>
>>>>>>
>>>>>> forvalues x = 1/`m' {
>>>>>> //run regression for every entity in data set
>>>>>> regress r ind if i==`x' & twindow
>>>>>>
>>>>>> predict res if i==`x', residuals //predict
>>>>>> residuals both in-sample and out-of-sample
>>>>>> replace ar=res if i==`x' & holidaywindow //replace
>>>>>> ar=. with thes estimated residuals
>>>>>> drop res
>>>>>> }
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
```

**References**:**st: drop variables in panel data with loop***From:*Lisa Wang <[email protected]>

**st: RE: drop variables in panel data with loop***From:*"Arinloye Djalal" <[email protected]>

**Re: st: RE: drop variables in panel data with loop***From:*Lisa Wang <[email protected]>

**Re: st: RE: drop variables in panel data with loop***From:*Nick Cox <[email protected]>

**Re: st: RE: drop variables in panel data with loop***From:*Lisa Wang <[email protected]>

**Re: st: RE: drop variables in panel data with loop***From:*Nick Cox <[email protected]>

**Re: st: RE: drop variables in panel data with loop***From:*Lisa Wang <[email protected]>

**RE: st: RE: drop variables in panel data with loop***From:*Nick Cox <[email protected]>

- Prev by Date:
**st: Construct a Panel dataset** - Next by Date:
**st: Systems of simultaneous equations / Longitudinal / All binary dependent variables** - Previous by thread:
**RE: st: RE: drop variables in panel data with loop** - Next by thread:
**Re: st: RE: drop variables in panel data with loop** - Index(es):