[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Re: cleaning panel data

From	Christopher F Baum <[email protected]>
To	[email protected]
Subject	st: Re: cleaning panel data
Date	Thu, 26 Sep 2002 08:27:48 -0400

--On Thursday, September 26, 2002 2:33 -0400 Alejandro wrote:

I have a huge industrial survey which is a panel dataset. I have the id of
the each firm and the region in which this firm is based. I'd like to
check how many of the firms  in this dataset have errors in the sense
that the same id would be associated with a different region and/or that
a given firm would have different year of foundation. (to have an idea of
the % of errors in the database);
I also want to know which ones are the "problematic" firms.

You haven't how the year of foundation plays into this; let's assume (probably incorrectly) that your first year of data for each firm defines that object. Then

preserve
bys firm year: keep firmid year region if _n==1

You should then have a dataset with one record per firm. Various utilities like 'dups' may then be applied to see where your assumptions are violated. For instance, firmid should now be unique (a given firmid, in the firm's year of foundation, should appear only once). If the same firmid appears with multiple years (of foundation, or more properly of entry into the sample) or with multple regions (indicating that the same firm's foundation in 19xx was recorded in two different regions) it should be easy enough to locate the problematic observations.

Kit
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

Prev by Date: st: Re: NaN (the inedible kind)
Next by Date: st: Re: cleaning panel data
Previous by thread: st: Re: NaN (the inedible kind)
Next by thread: st: Re: cleaning panel data
Index(es):
- Date
- Thread