[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: keeping inventory while cleaning data

From   Maarten buis <>
Subject   Re: st: keeping inventory while cleaning data
Date   Mon, 7 Jul 2008 08:44:09 +0100 (BST)

--- Michael McCulloch <> wrote:
> Hello, in order to keep inventory while cleaning data, I'm using a 
> clumsy manual step to keep track of how many observations I drop. For
> example:
> . describe, short	//INVENTORY: dropped 28097 for histology not 
> NSCLC; N = 51,560
> Is there a more elegant way to do this that could give me a summary 
> list of variables dropped & the reason for drop?

One option is not to drop these observations but create two new
variables: touse and problem. The former is 1 when you want to use an
observation and 0 otherwise, while the latter is categorical with
labels identifying the problem. When you are doing your analysis you
just add -if touse- to your command.

*-------------- begin example ---------------------
sysuse auto, clear
gen byte touse = !missing(rep78, foreign, mpg)
gen byte problem = missing(rep78, foreign, mpg)
label define problem 0 "no problem"
label define problem 1 "missing values", add

replace touse = 0 if rep78 == 1
replace problem = 2 if rep78 == 1
label define problem 2 "rep78 == 1", add

label value problem problem
tab problem

reg mpg foreign rep78 if touse 
*----------------- end example ---------------------
(For more on how to use examples I sent to the Statalist, see )

Hope this helps,

Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands

visiting address:
Buitenveldertselaan 3 (Metropolitan), room Z434

+31 20 5986715

Not happy with your email address?.
Get the one you really want - millions of new email addresses available now at Yahoo!
*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index