Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Re: Data manipulation: how to keep only consecutive obs in an unbalanced panel


From   Roberto <terrymondo@onetel.net.uk>
To   statalist@hsphsun2.harvard.edu
Subject   st: Re: Data manipulation: how to keep only consecutive obs in an unbalanced panel
Date   Mon, 11 Oct 2004 15:24:02 +0100

Dear Nick
thank you very much for your kind reply
I had actually looked at the suggested faq but I am not quite sure it is the answer to what I am after
See the thing is that I need Stata to "clean" the dataset from non-consecutive obs for each firm.
What I can achieve using either
. tsset time
. gen run = .
. replace run = cond(L.run == ., 1, L.run + 1)
. egen maxrun = max(run)

or

tsspell, f(L.time == .)
by id: egen maxrun = max(_seq)

is to identify the maximum # of consecutive obs per firm.
Say for example that for a firm in my unbalanced panel I have observations for the years: 1991,1992,1997,1998,1999,2000,2001.
So this firm has N=7 and maxrun=5
What I would like to do is to find a command to drop 1991 and 1992 and to keep only the 5 consecutive observations.
Is there a way to do this?
Sorry if I was not clear before and thank you very much for your time
Regards
Roberto Mura
PhD candidate
University of York



At 15:01 11/10/2004, you wrote:

I am not clear what the issue is here. The complaint
seems to be that Stata does what you tell it to do,
and not something else that you want.

I'd recommend a look at

How do I identify runs of consecutive observations in panel data?
http://www.stata.com/support/faqs/data/panel.html

by Vince Wiggins and friend, which may help.

Nick
n.j.cox@durham.ac.uk

Carter Ivan Rees

> There are duplicate commands already written for Stata, run help
> duplicates.
>
> The following link has also come in handy for me in the past
> if you are
> looking to count the recurrence of a particular individual
> across time.
>
> Your issue of consecutive years would require the code offered here to
> be tweaked a bit.
>
> http://www.stata.co.uk/support/faqs/data/dups.html

Roberto

> I am currently trying to run a GMM estimation using the xtabond2
> command.
>
> I have an unbalanced panel so I would want to manipulate my
> dataset in a
> way that Stata only keeps the continuous observations for
> each firm For
> example consider these 3 firms
>
> code    name    year    run    maxrun    _spell    _seq    _end
> n
>      N
> 135540  A       1995     1          5            1           1  1
> 1
>      6
> 135540  A       1997    1       5       2       1       0       2
> 6
> 135540 A        1998    2       5       2       2       0       3
> 6
> 135540 A        1999    3       5       2       3       0       4
> 6
> 135540 A        2000    4       5       2       4       0       5
> 6
> 135540 A        2001    5       5       2       5       1       6
> 6
> 900327 B        1991    1       7       1       1       0       1
> 10
> 900327 B        1992    2       7       1       2       0       2
> 10
> 900327 B        1993    3       7       1       3       1       3
> 10
> 900327 B        1995    1       7       2       1       0       4
> 10
> 900327 B        1996    2       7       2       2       0       5
> 10
> 900327 B        1997    3       7       2       3       0       6
> 10
> 900327 B        1998    4       7       2       4       0       7
> 10
> 900327 B        1999    5       7       2       5       0       8
> 10
> 900327 B        2000    6       7       2       6       0       9
> 10
> 900327 B        2001    7       7       2       7       1       10
> 10
> 134982
> C        1997    1       1       1       1       1       1       3
> 134982 C        1999    1       1       2       1       1       2
> 3
> 134982 C        2001    1       1       3       1       1       3
> 3
>
>
> with the tsspell command I have been able to identify the
> maximum number
> of consecutive observations per firms.
> The problem is that if I write
> keep if maxrun>=5
> stata only drops firm "C" whereas I would want it to drop all the
> non-consecutive observations in all panels so that the database would
> look
> like:
>
> code            name    year
> 135540  A       1997
> 135540  A       1998
> 135540  A       1999
> 135540  A       2000
> 135540  A       2001
> 900327  B       1995
> 900327  B       1996
> 900327  B       1997
> 900327  B       1998
> 900327  B       1999
> 900327  B       2000
> 900327  B       2001
>

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index