# Re: st: xtdescribe and panel data

 From A. Berâ To statalist@hsphsun2.harvard.edu Subject Re: st: xtdescribe and panel data Date Thu, 8 Sep 2011 14:40:35 +0300

```Dear Dr. Cox,

May I ask one more question if you don't mind? Is it possible to

Assume I would like to include in my analysis those firms that have n,
say five, consecutive observations. So for the firms below, the first
should be included; the second will not be included; for the third
one, the first two years should be deleted and the last 8 years should
be included; and for the last one, middle 5 observations will be
included

......11111111
111...........
11....11111111
11..11111.1111

Regards,

a.b.

On Tue, Sep 6, 2011 at 7:25 PM, Nick Cox <njcoxstata@gmail.com> wrote:
>
> This is a fiddly calculation, so I packaged it in a more respectable
> program. The main algorithm is simplified a bit too. Example first,
> code later.
>
> . webuse abdata
>
> . xtset
>       panel variable:  id (unbalanced)
>        time variable:  year, 1976 to 1984
>
> . xtpatternvar  , gen(pattern)
>
> . tab pattern
>
>    pattern |      Freq.     Percent        Cum.
> ------------+-----------------------------------
>  ..1111111 |         14        1.36        1.36
>  .1111111. |        273       26.48       27.84
>  .11111111 |        152       14.74       42.58
>  1111111.. |        434       42.10       84.68
>  11111111. |         32        3.10       87.78
>  111111111 |        126       12.22      100.00
> ------------+-----------------------------------
>      Total |      1,031      100.00
>
>
> *! NJC 1.0.0 6 Sept 2011
> program xtpatternvar, sort
>        version 9.2
>        syntax [if] [in] , GENerate(name)
>
>        confirm new var `generate'
>        local g `generate'
>
>        quietly {
>                xtset
>                local t `r(timevar)'
>                local id `r(panelvar)'
>
>                marksample touse
>                count if `touse'
>                if r(N) == 0 error 2000
>
>                su `t' if `touse', meanonly
>                local max = r(max)
>                local min = r(min)
>                local range = r(max) - r(min) + 1
>
>                if `range' > 244 {
>                        di as err "no go; patterns too long for str244"
>                        exit 498
>                }
>
>                local miss : di _dup(`range') "."
>
>                bysort `touse' `id' (`t') : ///
>                gen `g' = substr("`miss'", 1, `t'[1]-`min') + "1" if _n == 1
>
>                by `touse' `id' : replace `g' = ///
>                substr("`miss'", 1, `t'- `t'[_n-1] - 1) + "1" if _n > 1
>
>                by `touse' `id': replace `g' = ///
>                `g' + substr("`miss'", 1, `max'-`t'[_N]) if _n == _N
>
>                by `touse' `id' : replace `g' = `g'[_n-1] + `g' if _n > 1
>
>                by `touse' `id' : replace `g' = cond(`touse', `g'[_N], "")
>
>                compress `g'
>        }
> end
>
>
>
> On Tue, Sep 6, 2011 at 10:31 AM, Nick Cox <njcoxstata@gmail.com> wrote:
> > On Tue, Sep 6, 2011 at 9:12 AM, A. Berâ <abdullahbera@gmail.com> wrote:
> >
> >>    I have some panel data as described below. Few questions:
> >>
> >> 1. Can these data be analyzed by panel data methods? I would
> >> appreciate any suggestions about a suitable approach for these data.
> >
> > You have panel data. You let slip that the panels are firms. Do
> > something that makes economic sense.
> > That seems all that can be advised.
> >
> >> 2. How can I delete firms that have a specific pattern? For example
> >> how can I delete these type of firms: 1..........111 ?
> >
> > You can create a pattern variable like this.
> >
> > use  http://www.stata-press.com/data/r10/xtdatasmpl.dta, clear
> > xtset idcode year
> > keep if idcode <= 5
> > su year, meanonly
> > local max = r(max)
> > local min = r(min)
> > local range = r(max) - r(min) + 1
> > local miss : di _dup(`range') "."
> > bysort idcode (year) : gen this = substr("`miss'", 1, year[1]-`min') +
> > "1" if _n == 1
> > by idcode : replace this = substr("`miss'", 1, year- year[_n-1] - 1) +
> > "1" if _n > 1
> > by idcode : replace this = this + substr("`miss'", 1, `max'-year[_N])
> > if _n == _N
> > by idcode : gen pattern = this[1]
> > by idcode : replace pattern = pattern[_n-1] + this if _n > 1
> > by idcode : replace pattern = pattern[_N]
> > tab pattern
> > xtdes
> >
> > After that you can do things conditionally on values of -pattern-.
> >
> >> 3. Is imputation appropriate if "holes" between years is more than one?
> >
> > You could interpolate. People usually don't with this kind of data.
> >
> >> Many thanks for any help.
> >> --
> >> abdullah berâ
> >>
> >>
> >> . xtdescribe, patterns(1000)
> >>
> >>    id:  2, 3, ..., 37376                                  n =      22997
> >>     date:  1996, 1997, ..., 2009                             T =         14
> >>           Delta(date) = 1 unit
> >>           Span(date)  = 14 periods
> >>           (id*date uniquely identifies each observation)
> >>
> >> Distribution of T_i:   min      5%     25%       50%       75%     95%     max
> >>                         1       1       2         4         9      14      14
> >>
> >>     Freq.  Percent    Cum. |  Pattern
> >>  ---------------------------+----------------
> >>     3171     13.79   13.79 |  1.............
> >>     2447     10.64   24.43 |  11111111111111
> >>     1932      8.40   32.83 |  11............
> >>     1471      6.40   39.23 |  ...........111
> >>     1066      4.64   43.86 |  ..........1111
> >
> > <big snip>
> >
>
```