[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: RE: error checking
> I have a huge industrial survey which is a panel dataset. I
> have the id of
> the each firm and the region in which this firm is based.
> I'd like to check
> how many of the firms in this dataset have errors in the
> sense that the
> same id would be associated with a different region and/or
> that a given firm
> would have different year of foundation. (to have an idea
> of the % of errors
> in the database);
> I also want to know which ones are the "problematic" firms.
. bysort firm (year) : gen prob1 = year != year[_N]
. bysort id (region) : gen prob2 = region != region [_N]
. list firm id year region if prob1 | prob2
Logic: for example, sort by -firm- and within each -firm- by -year-.
If the last
value of -year- for each -firm- differs from the first, you have
FAQ explaining another example and giving further comment at
How do I list observations in a group that differ on a variable?
Also the same stuff, ad nauseam, at
How to move step by: step. The Stata Journal 2, 86-102.
This is also one problem where -egen, mode()- may be useful. You
let the data decide by majority vote what they really are.
As I recall, there is some generality built into -mode()-
so that it can be used with string variables as well as numeric.
* For searches and help try: