Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: detecting a complete data set


From   Nick Cox <[email protected]>
To   "'[email protected]'" <[email protected]>
Subject   RE: st: detecting a complete data set
Date   Tue, 16 Nov 2010 11:41:49 +0000

I want just to add a few notes comparing these solutions and mentioning some others. 

1. Mitch's solution contains typos, as != not !== indicates not equal to: 

count if A !=. & B !=. & C !=. & D !=.

2. Beyond that, Mitch's solution differs from Phil's, as it won't exclude extended numeric missing values .a to .z. It could be extended to exclude all numeric missings by changing != to < and to exclude string missings by adding conditions of the form E != "" for a string variable E.

3. Phil's solution of working with -missing(A,B,C,D)- and its negation -!missing(A, B, C, D)- is a good solution for a small or moderate number of variables. Beyond that, writing out a long comma-separated varlist is a little tedious and error-prone. 

4. Moreover, -missing()- happily takes a mixture of numeric and string arguments. 

5. For many variables you could use -egen-'s -rowmiss()- function which uses -missing()- internally to create a new variable counting missings in observations. The advantage of that it takes varlists, including variable ranges and wildcards. A value of 0 for the resulting variable means all present. 

6. Some people have learned the trick of throwing a set of variables at -regress- which naturally will only accept complete observations on the variables specified. After the regression e(sample) tags observations that are all present. 

. regress A B C D 
. gen byte allpresent = e(sample) 

7. Programmers might want to use commands specifically provided for this purpose. After something like 

. gen byte allpresent = 1 

either 

. markout allpresent <varlist> 

or 

. markout allpresent <varlist>, strok 

lets you tune your tagging. 

8. This is undoubtedly not a complete list. 

Nick 
[email protected] 

Mitch Abdon
===========

If you just need the number of observations with nonmissing ABC and D , try:

count if A !==. & B !==. & C !==. & D !==.

You can also generate a variable that will indicate if the observation
has no missing values (1 if no missing values and 0 otherwise),
example:

gen nomissing=(A !==. & B !==. & C !==. & D !==. )
tab nomissing

Phil Clayton
=============

If I'm understanding your question properly, you simply want to know the total number of observations (rows in your dataset) with complete data for the variables of interest? 
 
count if !missing(id, level, A, B, C, D)

Donald Spady
============

I have a dataset with 6 variables of interest: id level A B C D.  There are 100 individual id values, 24 individual level values and values for ABCD for each level of each id.  There are a lot of missing data.  How can I determine how many complete data sets I have; i.e. data sets of ID, Level, and A B C D values that are complete.  I have looked at misstable.  It is easy to determine the number of missing  A B C D data but when it comes to seeing how many complete sets of Level A B C D , I don't know what to do.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index