[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: Re: cleaning panel data
--On Thursday, September 26, 2002 2:33 -0400 Alejandro wrote:
You haven't how the year of foundation plays into this; let's assume
(probably incorrectly) that your first year of data for each firm defines
that object. Then
I have a huge industrial survey which is a panel dataset. I have the id of
the each firm and the region in which this firm is based. I'd like to
check how many of the firms in this dataset have errors in the sense
that the same id would be associated with a different region and/or that
a given firm would have different year of foundation. (to have an idea of
the % of errors in the database);
I also want to know which ones are the "problematic" firms.
bys firm year: keep firmid year region if _n==1
You should then have a dataset with one record per firm. Various utilities
like 'dups' may then be applied to see where your assumptions are violated.
For instance, firmid should now be unique (a given firmid, in the firm's
year of foundation, should appear only once). If the same firmid appears
with multiple years (of foundation, or more properly of entry into the
sample) or with multple regions (indicating that the same firm's foundation
in 19xx was recorded in two different regions) it should be easy enough to
locate the problematic observations.
* For searches and help try: