Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

# Re: st: xtdescribe and panel data

 From Nick Cox To statalist@hsphsun2.harvard.edu Subject Re: st: xtdescribe and panel data Date Thu, 8 Sep 2011 13:20:31 +0100

```There is no need to modify any code. You can just apply standard Stata
logic and existing code.

Your problem falls into two parts.

1. Identifying panels with (at least) five consecutive observations

Once you have a pattern variable, the condition that an observation
belongs to a panel with at least five consecutive observations is

... if strpos(patternvar, "11111")

as "11111" will be included somewhere as a substring within the value
of the pattern variable. The condition that an observation belongs to
a panel with precisely five consecutive observations is

... if strpos(patternvar, "11111")  & !strpos(patternvar, "111111")

and so forth. You implied that you wanted the second (five), but your
examples make clear that you really want the first (at least five).
That's fine. You could -drop- any observations, and thus any panels,
that don't satisfy your criterion, but that would not reduce each
panel to its longest spell of consecutive observations.

2. Keeping just the longest spell of consecutive observations at least
some length

A little searching turns up relevant material. See the help for
-tsspell- (SSC) and the FAQ

FAQ     . . . . . . Identifying runs of consecutive observations in panel data
. . . . . . . . . . . . . . . . . . . . . . . N. J. Cox and V. Wiggins
8/02    How do I identify runs of consecutive observations
in panel data?
http://www.stata.com/support/faqs/data/panel.html

In fact the FAQ contains all that you really need to answer your
question. I will use -tsspell- (SSC) on an example.

webuse abdata

You don't need to use -xtpatternvar- (previously posted in this
thread) to see what it is like, but I will illustrate nevertheless.

. xtpatternvar, gen(pattern)

. tab pattern

pattern |      Freq.     Percent        Cum.
------------+-----------------------------------
..1111111 |         14        1.36        1.36
.1111111. |        273       26.48       27.84
.11111111 |        152       14.74       42.58
1111111.. |        434       42.10       84.68
11111111. |         32        3.10       87.78
111111111 |        126       12.22      100.00
------------+-----------------------------------
Total |      1,031      100.00

. xtset
panel variable:  id (unbalanced)
time variable:  year, 1976 to 1984
delta:  1 unit

The help for -tsspell- gives an example of identifying spells of
consecutive observations. The FAQ explains the logic.

. tsspell, f(L.year==.)

-tsspell- creates three new variables, by default _spell, _seq, _end.

. ds
c1        emp       indoutpt  k         yearm1    nL2       kL2
yr1976    yr1979    yr1982    pattern   _end
ind       wage      n         ys        id        wL1       ysL1
yr1977    yr1980    yr1983    _spell
year      cap       w         rec       nL1       kL1       ysL2
yr1978    yr1981    yr1984    _seq

The length of a spell is the highest value of _seq within that spell.

. egen length = max(_seq), by(id _spell)

The length of the _longest_ spell for any panel will be

gen maxlength = max(_seq), by(id)

Now we can use any relevant condition(s) we like to select spells.

. keep if length == 8

In other words, for your problem as now stated, you don't need my
-xtpatternvar- at all. But -tsspell- might come in handy. See also

SJ-7-2  dm0029  . . . . . . . . . . . . . . Speaking Stata: Identifying spells
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
Q2/07   SJ 7(2):249--265                                 (no commands)
shows how to handle spells with complete control over
spell specification

That's a discussion of principles; there is no reference to -tsspell-.

Nick

On Thu, Sep 8, 2011 at 12:40 PM, A. Berâ <abdullahbera@gmail.com> wrote:
> Dear Dr. Cox,
>
>  Thank you very  much for your detailed and helpful response.
>
> May I ask one more question if you don't mind? Is it possible to
> modify your code as follows:
>
> Assume I would like to include in my analysis those firms that have n,
> say five, consecutive observations. So for the firms below, the first
> should be included; the second will not be included; for the third
> one, the first two years should be deleted and the last 8 years should
> be included; and for the last one, middle 5 observations will be
> included
>
> ......11111111
> 111...........
> 11....11111111
> 11..11111.1111
>
> Regards,
>
> a.b.
>
> On Tue, Sep 6, 2011 at 7:25 PM, Nick Cox <njcoxstata@gmail.com> wrote:
>>
>> This is a fiddly calculation, so I packaged it in a more respectable
>> program. The main algorithm is simplified a bit too. Example first,
>> code later.
>>
>> . webuse abdata
>>
>> . xtset
>>       panel variable:  id (unbalanced)
>>        time variable:  year, 1976 to 1984
>>
>> . xtpatternvar  , gen(pattern)
>>
>> . tab pattern
>>
>>    pattern |      Freq.     Percent        Cum.
>> ------------+-----------------------------------
>>  ..1111111 |         14        1.36        1.36
>>  .1111111. |        273       26.48       27.84
>>  .11111111 |        152       14.74       42.58
>>  1111111.. |        434       42.10       84.68
>>  11111111. |         32        3.10       87.78
>>  111111111 |        126       12.22      100.00
>> ------------+-----------------------------------
>>      Total |      1,031      100.00
>>
>>
>> *! NJC 1.0.0 6 Sept 2011
>> program xtpatternvar, sort
>>        version 9.2
>>        syntax [if] [in] , GENerate(name)
>>
>>        confirm new var `generate'
>>        local g `generate'
>>
>>        quietly {
>>                xtset
>>                local t `r(timevar)'
>>                local id `r(panelvar)'
>>
>>                marksample touse
>>                count if `touse'
>>                if r(N) == 0 error 2000
>>
>>                su `t' if `touse', meanonly
>>                local max = r(max)
>>                local min = r(min)
>>                local range = r(max) - r(min) + 1
>>
>>                if `range' > 244 {
>>                        di as err "no go; patterns too long for str244"
>>                        exit 498
>>                }
>>
>>                local miss : di _dup(`range') "."
>>
>>                bysort `touse' `id' (`t') : ///
>>                gen `g' = substr("`miss'", 1, `t'[1]-`min') + "1" if _n == 1
>>
>>                by `touse' `id' : replace `g' = ///
>>                substr("`miss'", 1, `t'- `t'[_n-1] - 1) + "1" if _n > 1
>>
>>                by `touse' `id': replace `g' = ///
>>                `g' + substr("`miss'", 1, `max'-`t'[_N]) if _n == _N
>>
>>                by `touse' `id' : replace `g' = `g'[_n-1] + `g' if _n > 1
>>
>>                by `touse' `id' : replace `g' = cond(`touse', `g'[_N], "")
>>
>>                compress `g'
>>        }
>> end
>>
>>
>>
>> On Tue, Sep 6, 2011 at 10:31 AM, Nick Cox <njcoxstata@gmail.com> wrote:
>> > On Tue, Sep 6, 2011 at 9:12 AM, A. Berâ <abdullahbera@gmail.com> wrote:
>> >
>> >>    I have some panel data as described below. Few questions:
>> >>
>> >> 1. Can these data be analyzed by panel data methods? I would
>> >> appreciate any suggestions about a suitable approach for these data.
>> >
>> > You have panel data. You let slip that the panels are firms. Do
>> > something that makes economic sense.
>> > That seems all that can be advised.
>> >
>> >> 2. How can I delete firms that have a specific pattern? For example
>> >> how can I delete these type of firms: 1..........111 ?
>> >
>> > You can create a pattern variable like this.
>> >
>> > use  http://www.stata-press.com/data/r10/xtdatasmpl.dta, clear
>> > xtset idcode year
>> > keep if idcode <= 5
>> > su year, meanonly
>> > local max = r(max)
>> > local min = r(min)
>> > local range = r(max) - r(min) + 1
>> > local miss : di _dup(`range') "."
>> > bysort idcode (year) : gen this = substr("`miss'", 1, year[1]-`min') +
>> > "1" if _n == 1
>> > by idcode : replace this = substr("`miss'", 1, year- year[_n-1] - 1) +
>> > "1" if _n > 1
>> > by idcode : replace this = this + substr("`miss'", 1, `max'-year[_N])
>> > if _n == _N
>> > by idcode : gen pattern = this[1]
>> > by idcode : replace pattern = pattern[_n-1] + this if _n > 1
>> > by idcode : replace pattern = pattern[_N]
>> > tab pattern
>> > xtdes
>> >
>> > After that you can do things conditionally on values of -pattern-.
>> >
>> >> 3. Is imputation appropriate if "holes" between years is more than one?
>> >
>> > You could interpolate. People usually don't with this kind of data.
>> >
>> >> Many thanks for any help.
>> >> --
>> >> abdullah berâ
>> >>
>> >>
>> >> . xtdescribe, patterns(1000)
>> >>
>> >>    id:  2, 3, ..., 37376                                  n =      22997
>> >>     date:  1996, 1997, ..., 2009                             T =         14
>> >>           Delta(date) = 1 unit
>> >>           Span(date)  = 14 periods
>> >>           (id*date uniquely identifies each observation)
>> >>
>> >> Distribution of T_i:   min      5%     25%       50%       75%     95%     max
>> >>                         1       1       2         4         9      14      14
>> >>
>> >>     Freq.  Percent    Cum. |  Pattern
>> >>  ---------------------------+----------------
>> >>     3171     13.79   13.79 |  1.............
>> >>     2447     10.64   24.43 |  11111111111111
>> >>     1932      8.40   32.83 |  11............
>> >>     1471      6.40   39.23 |  ...........111
>> >>     1066      4.64   43.86 |  ..........1111
>> >
>> > <big snip>
>> >

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index