Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: xtdescribe and panel data


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: xtdescribe and panel data
Date   Thu, 8 Sep 2011 13:51:17 +0100

Specifically in your example it seems that panels are up to 14 in
length. That opens up the possibility of two spells in each panel and
each consecutiver and at least 5 long. You can check for this using
the same kind of logic.

On Thu, Sep 8, 2011 at 1:20 PM, Nick Cox <njcoxstata@gmail.com> wrote:
> Abdullah:
>
> There is no need to modify any code. You can just apply standard Stata
> logic and existing code.
>
> Your problem falls into two parts.
>
> 1. Identifying panels with (at least) five consecutive observations
>
> Once you have a pattern variable, the condition that an observation
> belongs to a panel with at least five consecutive observations is
>
> ... if strpos(patternvar, "11111")
>
> as "11111" will be included somewhere as a substring within the value
> of the pattern variable. The condition that an observation belongs to
> a panel with precisely five consecutive observations is
>
> ... if strpos(patternvar, "11111")  & !strpos(patternvar, "111111")
>
> and so forth. You implied that you wanted the second (five), but your
> examples make clear that you really want the first (at least five).
> That's fine. You could -drop- any observations, and thus any panels,
> that don't satisfy your criterion, but that would not reduce each
> panel to its longest spell of consecutive observations.
>
> 2. Keeping just the longest spell of consecutive observations at least
> some length
>
> A little searching turns up relevant material. See the help for
> -tsspell- (SSC) and the FAQ
>
> FAQ     . . . . . . Identifying runs of consecutive observations in panel data
>        . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox and V. Wiggins
>        8/02    How do I identify runs of consecutive observations
>                in panel data?
>                http://www.stata.com/support/faqs/data/panel.html
>
> In fact the FAQ contains all that you really need to answer your
> question. I will use -tsspell- (SSC) on an example.
>
>  webuse abdata
>
> You don't need to use -xtpatternvar- (previously posted in this
> thread) to see what it is like, but I will illustrate nevertheless.
>
> . xtpatternvar, gen(pattern)
>
> . tab pattern
>
>    pattern |      Freq.     Percent        Cum.
> ------------+-----------------------------------
>  ..1111111 |         14        1.36        1.36
>  .1111111. |        273       26.48       27.84
>  .11111111 |        152       14.74       42.58
>  1111111.. |        434       42.10       84.68
>  11111111. |         32        3.10       87.78
>  111111111 |        126       12.22      100.00
> ------------+-----------------------------------
>      Total |      1,031      100.00
>
> . xtset
>       panel variable:  id (unbalanced)
>        time variable:  year, 1976 to 1984
>                delta:  1 unit
>
> The help for -tsspell- gives an example of identifying spells of
> consecutive observations. The FAQ explains the logic.
>
> . tsspell, f(L.year==.)
>
> -tsspell- creates three new variables, by default _spell, _seq, _end.
>
> . ds
> c1        emp       indoutpt  k         yearm1    nL2       kL2
> yr1976    yr1979    yr1982    pattern   _end
> ind       wage      n         ys        id        wL1       ysL1
> yr1977    yr1980    yr1983    _spell
> year      cap       w         rec       nL1       kL1       ysL2
> yr1978    yr1981    yr1984    _seq
>
> The length of a spell is the highest value of _seq within that spell.
>
> . egen length = max(_seq), by(id _spell)
>
> The length of the _longest_ spell for any panel will be
>
> gen maxlength = max(_seq), by(id)
>
> Now we can use any relevant condition(s) we like to select spells.
>
> . keep if length == 8
>
>
>
> In other words, for your problem as now stated, you don't need my
> -xtpatternvar- at all. But -tsspell- might come in handy. See also
>
> SJ-7-2  dm0029  . . . . . . . . . . . . . . Speaking Stata: Identifying spells
>        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
>        Q2/07   SJ 7(2):249--265                                 (no commands)
>        shows how to handle spells with complete control over
>        spell specification
>
> That's a discussion of principles; there is no reference to -tsspell-.
>
> Nick
>
> On Thu, Sep 8, 2011 at 12:40 PM, A. Berâ <abdullahbera@gmail.com> wrote:
>> Dear Dr. Cox,
>>
>>  Thank you very  much for your detailed and helpful response.
>>
>> May I ask one more question if you don't mind? Is it possible to
>> modify your code as follows:
>>
>> Assume I would like to include in my analysis those firms that have n,
>> say five, consecutive observations. So for the firms below, the first
>> should be included; the second will not be included; for the third
>> one, the first two years should be deleted and the last 8 years should
>> be included; and for the last one, middle 5 observations will be
>> included
>>
>> ......11111111
>> 111...........
>> 11....11111111
>> 11..11111.1111
>>
>> Regards,
>>
>> a.b.
>>
>> On Tue, Sep 6, 2011 at 7:25 PM, Nick Cox <njcoxstata@gmail.com> wrote:
>>>
>>> This is a fiddly calculation, so I packaged it in a more respectable
>>> program. The main algorithm is simplified a bit too. Example first,
>>> code later.
>>>
>>> . webuse abdata
>>>
>>> . xtset
>>>       panel variable:  id (unbalanced)
>>>        time variable:  year, 1976 to 1984
>>>
>>> . xtpatternvar  , gen(pattern)
>>>
>>> . tab pattern
>>>
>>>    pattern |      Freq.     Percent        Cum.
>>> ------------+-----------------------------------
>>>  ..1111111 |         14        1.36        1.36
>>>  .1111111. |        273       26.48       27.84
>>>  .11111111 |        152       14.74       42.58
>>>  1111111.. |        434       42.10       84.68
>>>  11111111. |         32        3.10       87.78
>>>  111111111 |        126       12.22      100.00
>>> ------------+-----------------------------------
>>>      Total |      1,031      100.00
>>>
>>>
>>> *! NJC 1.0.0 6 Sept 2011
>>> program xtpatternvar, sort
>>>        version 9.2
>>>        syntax [if] [in] , GENerate(name)
>>>
>>>        confirm new var `generate'
>>>        local g `generate'
>>>
>>>        quietly {
>>>                xtset
>>>                local t `r(timevar)'
>>>                local id `r(panelvar)'
>>>
>>>                marksample touse
>>>                count if `touse'
>>>                if r(N) == 0 error 2000
>>>
>>>                su `t' if `touse', meanonly
>>>                local max = r(max)
>>>                local min = r(min)
>>>                local range = r(max) - r(min) + 1
>>>
>>>                if `range' > 244 {
>>>                        di as err "no go; patterns too long for str244"
>>>                        exit 498
>>>                }
>>>
>>>                local miss : di _dup(`range') "."
>>>
>>>                bysort `touse' `id' (`t') : ///
>>>                gen `g' = substr("`miss'", 1, `t'[1]-`min') + "1" if _n == 1
>>>
>>>                by `touse' `id' : replace `g' = ///
>>>                substr("`miss'", 1, `t'- `t'[_n-1] - 1) + "1" if _n > 1
>>>
>>>                by `touse' `id': replace `g' = ///
>>>                `g' + substr("`miss'", 1, `max'-`t'[_N]) if _n == _N
>>>
>>>                by `touse' `id' : replace `g' = `g'[_n-1] + `g' if _n > 1
>>>
>>>                by `touse' `id' : replace `g' = cond(`touse', `g'[_N], "")
>>>
>>>                compress `g'
>>>        }
>>> end
>>>
>>>
>>>
>>> On Tue, Sep 6, 2011 at 10:31 AM, Nick Cox <njcoxstata@gmail.com> wrote:
>>> > On Tue, Sep 6, 2011 at 9:12 AM, A. Berâ <abdullahbera@gmail.com> wrote:
>>> >
>>> >>    I have some panel data as described below. Few questions:
>>> >>
>>> >> 1. Can these data be analyzed by panel data methods? I would
>>> >> appreciate any suggestions about a suitable approach for these data.
>>> >
>>> > You have panel data. You let slip that the panels are firms. Do
>>> > something that makes economic sense.
>>> > That seems all that can be advised.
>>> >
>>> >> 2. How can I delete firms that have a specific pattern? For example
>>> >> how can I delete these type of firms: 1..........111 ?
>>> >
>>> > You can create a pattern variable like this.
>>> >
>>> > use  http://www.stata-press.com/data/r10/xtdatasmpl.dta, clear
>>> > xtset idcode year
>>> > keep if idcode <= 5
>>> > su year, meanonly
>>> > local max = r(max)
>>> > local min = r(min)
>>> > local range = r(max) - r(min) + 1
>>> > local miss : di _dup(`range') "."
>>> > bysort idcode (year) : gen this = substr("`miss'", 1, year[1]-`min') +
>>> > "1" if _n == 1
>>> > by idcode : replace this = substr("`miss'", 1, year- year[_n-1] - 1) +
>>> > "1" if _n > 1
>>> > by idcode : replace this = this + substr("`miss'", 1, `max'-year[_N])
>>> > if _n == _N
>>> > by idcode : gen pattern = this[1]
>>> > by idcode : replace pattern = pattern[_n-1] + this if _n > 1
>>> > by idcode : replace pattern = pattern[_N]
>>> > tab pattern
>>> > xtdes
>>> >
>>> > After that you can do things conditionally on values of -pattern-.
>>> >
>>> >> 3. Is imputation appropriate if "holes" between years is more than one?
>>> >
>>> > You could interpolate. People usually don't with this kind of data.
>>> >
>>> >> Many thanks for any help.
>>> >> --
>>> >> abdullah berâ
>>> >>
>>> >>
>>> >> . xtdescribe, patterns(1000)
>>> >>
>>> >>    id:  2, 3, ..., 37376                                  n =      22997
>>> >>     date:  1996, 1997, ..., 2009                             T =         14
>>> >>           Delta(date) = 1 unit
>>> >>           Span(date)  = 14 periods
>>> >>           (id*date uniquely identifies each observation)
>>> >>
>>> >> Distribution of T_i:   min      5%     25%       50%       75%     95%     max
>>> >>                         1       1       2         4         9      14      14
>>> >>
>>> >>     Freq.  Percent    Cum. |  Pattern
>>> >>  ---------------------------+----------------
>>> >>     3171     13.79   13.79 |  1.............
>>> >>     2447     10.64   24.43 |  11111111111111
>>> >>     1932      8.40   32.83 |  11............
>>> >>     1471      6.40   39.23 |  ...........111
>>> >>     1066      4.64   43.86 |  ..........1111
>>> >
>>> > <big snip>
>>> >
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index