Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: xtdescribe and panel data


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: xtdescribe and panel data
Date   Tue, 6 Sep 2011 17:25:04 +0100

This is a fiddly calculation, so I packaged it in a more respectable
program. The main algorithm is simplified a bit too. Example first,
code later.

. webuse abdata

. xtset
       panel variable:  id (unbalanced)
        time variable:  year, 1976 to 1984

. xtpatternvar  , gen(pattern)

. tab pattern

    pattern |      Freq.     Percent        Cum.
------------+-----------------------------------
  ..1111111 |         14        1.36        1.36
  .1111111. |        273       26.48       27.84
  .11111111 |        152       14.74       42.58
  1111111.. |        434       42.10       84.68
  11111111. |         32        3.10       87.78
  111111111 |        126       12.22      100.00
------------+-----------------------------------
      Total |      1,031      100.00


*! NJC 1.0.0 6 Sept 2011
program xtpatternvar, sort
	version 9.2
	syntax [if] [in] , GENerate(name)

	confirm new var `generate'
	local g `generate'

	quietly {
		xtset
		local t `r(timevar)'
		local id `r(panelvar)'
	
		marksample touse
		count if `touse'
		if r(N) == 0 error 2000

		su `t' if `touse', meanonly
		local max = r(max)
		local min = r(min)
		local range = r(max) - r(min) + 1

		if `range' > 244 {
			di as err "no go; patterns too long for str244"
			exit 498
		}

		local miss : di _dup(`range') "."

		bysort `touse' `id' (`t') : ///
        	gen `g' = substr("`miss'", 1, `t'[1]-`min') + "1" if _n == 1

		by `touse' `id' : replace `g' = ///
		substr("`miss'", 1, `t'- `t'[_n-1] - 1) + "1" if _n > 1

		by `touse' `id': replace `g' = ///
		`g' + substr("`miss'", 1, `max'-`t'[_N]) if _n == _N

		by `touse' `id' : replace `g' = `g'[_n-1] + `g' if _n > 1

		by `touse' `id' : replace `g' = cond(`touse', `g'[_N], "")

		compress `g'
	}
end



On Tue, Sep 6, 2011 at 10:31 AM, Nick Cox <njcoxstata@gmail.com> wrote:
> On Tue, Sep 6, 2011 at 9:12 AM, A. Berâ <abdullahbera@gmail.com> wrote:
>
>>    I have some panel data as described below. Few questions:
>>
>> 1. Can these data be analyzed by panel data methods? I would
>> appreciate any suggestions about a suitable approach for these data.
>
> You have panel data. You let slip that the panels are firms. Do
> something that makes economic sense.
> That seems all that can be advised.
>
>> 2. How can I delete firms that have a specific pattern? For example
>> how can I delete these type of firms: 1..........111 ?
>
> You can create a pattern variable like this.
>
> use  http://www.stata-press.com/data/r10/xtdatasmpl.dta, clear
> xtset idcode year
> keep if idcode <= 5
> su year, meanonly
> local max = r(max)
> local min = r(min)
> local range = r(max) - r(min) + 1
> local miss : di _dup(`range') "."
> bysort idcode (year) : gen this = substr("`miss'", 1, year[1]-`min') +
> "1" if _n == 1
> by idcode : replace this = substr("`miss'", 1, year- year[_n-1] - 1) +
> "1" if _n > 1
> by idcode : replace this = this + substr("`miss'", 1, `max'-year[_N])
> if _n == _N
> by idcode : gen pattern = this[1]
> by idcode : replace pattern = pattern[_n-1] + this if _n > 1
> by idcode : replace pattern = pattern[_N]
> tab pattern
> xtdes
>
> After that you can do things conditionally on values of -pattern-.
>
>> 3. Is imputation appropriate if "holes" between years is more than one?
>
> You could interpolate. People usually don't with this kind of data.
>
>> Many thanks for any help.
>> --
>> abdullah berâ
>>
>>
>> . xtdescribe, patterns(1000)
>>
>>    id:  2, 3, ..., 37376                                  n =      22997
>>     date:  1996, 1997, ..., 2009                             T =         14
>>           Delta(date) = 1 unit
>>           Span(date)  = 14 periods
>>           (id*date uniquely identifies each observation)
>>
>> Distribution of T_i:   min      5%     25%       50%       75%     95%     max
>>                         1       1       2         4         9      14      14
>>
>>     Freq.  Percent    Cum. |  Pattern
>>  ---------------------------+----------------
>>     3171     13.79   13.79 |  1.............
>>     2447     10.64   24.43 |  11111111111111
>>     1932      8.40   32.83 |  11............
>>     1471      6.40   39.23 |  ...........111
>>     1066      4.64   43.86 |  ..........1111
>
> <big snip>
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index