Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: detecting a complete data set

 From Donald Spady <[email protected]> To [email protected] Subject Re: st: detecting a complete data set Date Tue, 16 Nov 2010 06:31:46 -0700

I guess it really helps to define the question explicitly, and I was too vague.
Here is a simple example
id level A B C D
1  1     1   1  1 1
1  2      1  0  .  0
1  3     .   1  0  .
1  4     1  0  1  1
2  1      1  0  1  1
2  2     0  0 0  1
2  3    0   1  1  1
2  4    1  1  1  1

and so on

I want to detect the ID that has no data missing for ALL Levels within that ID.  In this case ID 2 fits the bill, ID 1 has data missing in levels 2 and 3
There are NO missing values for id or level.  They are basically placeholders.
I know how to find missing for  A B C D;  what I don't know is how to detect those IDs where there is some missing (A  B C D) data in at least 1 of the levels, OR, putting it the other way, I need to know the IDs where there are values for  A B C D for every level.

Thanks

Don

On 2010-11-16, at 4:41 AM, Nick Cox wrote:

> I want just to add a few notes comparing these solutions and mentioning some others.
>
> 1. Mitch's solution contains typos, as != not !== indicates not equal to:
>
> count if A !=. & B !=. & C !=. & D !=.
>
> 2. Beyond that, Mitch's solution differs from Phil's, as it won't exclude extended numeric missing values .a to .z. It could be extended to exclude all numeric missings by changing != to < and to exclude string missings by adding conditions of the form E != "" for a string variable E.
>
> 3. Phil's solution of working with -missing(A,B,C,D)- and its negation -!missing(A, B, C, D)- is a good solution for a small or moderate number of variables. Beyond that, writing out a long comma-separated varlist is a little tedious and error-prone.
>
> 4. Moreover, -missing()- happily takes a mixture of numeric and string arguments.
>
> 5. For many variables you could use -egen-'s -rowmiss()- function which uses -missing()- internally to create a new variable counting missings in observations. The advantage of that it takes varlists, including variable ranges and wildcards. A value of 0 for the resulting variable means all present.
>
> 6. Some people have learned the trick of throwing a set of variables at -regress- which naturally will only accept complete observations on the variables specified. After the regression e(sample) tags observations that are all present.
>
> . regress A B C D
> . gen byte allpresent = e(sample)
>
> 7. Programmers might want to use commands specifically provided for this purpose. After something like
>
> . gen byte allpresent = 1
>
> either
>
> . markout allpresent <varlist>
>
> or
>
> . markout allpresent <varlist>, strok
>
> lets you tune your tagging.
>
> 8. This is undoubtedly not a complete list.
>
> Nick
> [email protected]
>
> Mitch Abdon
> ===========
>
> If you just need the number of observations with nonmissing ABC and D , try:
>
> count if A !==. & B !==. & C !==. & D !==.
>
> You can also generate a variable that will indicate if the observation
> has no missing values (1 if no missing values and 0 otherwise),
> example:
>
> gen nomissing=(A !==. & B !==. & C !==. & D !==. )
> tab nomissing
>
> Phil Clayton
> =============
>
> If I'm understanding your question properly, you simply want to know the total number of observations (rows in your dataset) with complete data for the variables of interest?
>
> count if !missing(id, level, A, B, C, D)
>
> ============
>
> I have a dataset with 6 variables of interest: id level A B C D.  There are 100 individual id values, 24 individual level values and values for ABCD for each level of each id.  There are a lot of missing data.  How can I determine how many complete data sets I have; i.e. data sets of ID, Level, and A B C D values that are complete.  I have looked at misstable.  It is easy to determine the number of missing  A B C D data but when it comes to seeing how many complete sets of Level A B C D , I don't know what to do.
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>