Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: error checking


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: error checking
Date   Wed, 25 Sep 2002 22:23:56 +0100

Riano, Alejandro
>
> I have a huge industrial survey which is a panel dataset. I
> have the id of
> the each firm and the region in which this firm is based.
> I'd like to check
> how many of the firms  in this dataset have errors in the
> sense that the
> same id would be associated with a different region and/or
> that a given firm
> would have different year of foundation. (to have an idea
> of the % of errors
> in the database);
> I also want to know which ones are the "problematic" firms.
>

. bysort firm (year) : gen prob1 = year[1] != year[_N]
. bysort id (region) : gen prob2 = region[1] != region [_N]
. list firm id year region if prob1 | prob2

Logic: for example, sort by -firm- and within each -firm- by -year-.
If the last
value of -year- for each -firm- differs from the first, you have
a problem.

FAQ explaining another example and giving further comment at
How do I list observations in a group that differ on a variable?
http://www.stata.com/support/faqs/data/diff.html

Also the same stuff, ad nauseam, at
How to move step by: step. The Stata Journal 2, 86-102.

This is also one problem where -egen, mode()- may be useful. You
let the data decide by majority vote what they really are.
As I recall, there is some generality built into -mode()-
so that it can be used with string variables as well as numeric.

Nick
n.j.cox@durham.ac.uk



*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index