Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: detecting a complete data set

From	Nick Cox <[email protected]>
To	"'[email protected]'" <[email protected]>
Subject	RE: st: detecting a complete data set
Date	Tue, 16 Nov 2010 11:41:49 +0000

I want just to add a few notes comparing these solutions and mentioning some others. 

1. Mitch's solution contains typos, as != not !== indicates not equal to: 

count if A !=. & B !=. & C !=. & D !=.

2. Beyond that, Mitch's solution differs from Phil's, as it won't exclude extended numeric missing values .a to .z. It could be extended to exclude all numeric missings by changing != to < and to exclude string missings by adding conditions of the form E != "" for a string variable E.

3. Phil's solution of working with -missing(A,B,C,D)- and its negation -!missing(A, B, C, D)- is a good solution for a small or moderate number of variables. Beyond that, writing out a long comma-separated varlist is a little tedious and error-prone. 

4. Moreover, -missing()- happily takes a mixture of numeric and string arguments. 

5. For many variables you could use -egen-'s -rowmiss()- function which uses -missing()- internally to create a new variable counting missings in observations. The advantage of that it takes varlists, including variable ranges and wildcards. A value of 0 for the resulting variable means all present. 

6. Some people have learned the trick of throwing a set of variables at -regress- which naturally will only accept complete observations on the variables specified. After the regression e(sample) tags observations that are all present. 

. regress A B C D 
. gen byte allpresent = e(sample) 

7. Programmers might want to use commands specifically provided for this purpose. After something like 

. gen byte allpresent = 1 

either 

. markout allpresent <varlist> 

or 

. markout allpresent <varlist>, strok 

lets you tune your tagging. 

8. This is undoubtedly not a complete list. 

Nick 
[email protected] 

Mitch Abdon
===========

If you just need the number of observations with nonmissing ABC and D , try:

count if A !==. & B !==. & C !==. & D !==.

You can also generate a variable that will indicate if the observation
has no missing values (1 if no missing values and 0 otherwise),
example:

gen nomissing=(A !==. & B !==. & C !==. & D !==. )
tab nomissing

Phil Clayton
=============

If I'm understanding your question properly, you simply want to know the total number of observations (rows in your dataset) with complete data for the variables of interest? 
 
count if !missing(id, level, A, B, C, D)

Donald Spady
============

I have a dataset with 6 variables of interest: id level A B C D.  There are 100 individual id values, 24 individual level values and values for ABCD for each level of each id.  There are a lot of missing data.  How can I determine how many complete data sets I have; i.e. data sets of ID, Level, and A B C D values that are complete.  I have looked at misstable.  It is easy to determine the number of missing  A B C D data but when it comes to seeing how many complete sets of Level A B C D , I don't know what to do.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: detecting a complete data set
  - From: Donald Spady <[email protected]>

References:
- st: detecting a complete data set
  - From: Donald Spady <[email protected]>
- Re: st: detecting a complete data set
  - From: Mitch Abdon <[email protected]>

Prev by Date: st: R: OT: how to report statistics in (medical) journals
Next by Date: st: skip functions containing missing variables?
Previous by thread: Re: st: detecting a complete data set
Next by thread: Re: st: detecting a complete data set
Index(es):
- Date
- Thread