Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: detecting a complete data set


From   Donald Spady <[email protected]>
To   [email protected]
Subject   Re: st: detecting a complete data set
Date   Tue, 16 Nov 2010 06:31:46 -0700

I guess it really helps to define the question explicitly, and I was too vague.  
Here is a simple example
id level A B C D
1  1     1   1  1 1
1  2      1  0  .  0
1  3     .   1  0  .
1  4     1  0  1  1
2  1      1  0  1  1
2  2     0  0 0  1
2  3    0   1  1  1
2  4    1  1  1  1 

and so on

I want to detect the ID that has no data missing for ALL Levels within that ID.  In this case ID 2 fits the bill, ID 1 has data missing in levels 2 and 3
There are NO missing values for id or level.  They are basically placeholders.
I know how to find missing for  A B C D;  what I don't know is how to detect those IDs where there is some missing (A  B C D) data in at least 1 of the levels, OR, putting it the other way, I need to know the IDs where there are values for  A B C D for every level.

Thanks

Don

On 2010-11-16, at 4:41 AM, Nick Cox wrote:

> I want just to add a few notes comparing these solutions and mentioning some others. 
> 
> 1. Mitch's solution contains typos, as != not !== indicates not equal to: 
> 
> count if A !=. & B !=. & C !=. & D !=.
> 
> 2. Beyond that, Mitch's solution differs from Phil's, as it won't exclude extended numeric missing values .a to .z. It could be extended to exclude all numeric missings by changing != to < and to exclude string missings by adding conditions of the form E != "" for a string variable E.
> 
> 3. Phil's solution of working with -missing(A,B,C,D)- and its negation -!missing(A, B, C, D)- is a good solution for a small or moderate number of variables. Beyond that, writing out a long comma-separated varlist is a little tedious and error-prone. 
> 
> 4. Moreover, -missing()- happily takes a mixture of numeric and string arguments. 
> 
> 5. For many variables you could use -egen-'s -rowmiss()- function which uses -missing()- internally to create a new variable counting missings in observations. The advantage of that it takes varlists, including variable ranges and wildcards. A value of 0 for the resulting variable means all present. 
> 
> 6. Some people have learned the trick of throwing a set of variables at -regress- which naturally will only accept complete observations on the variables specified. After the regression e(sample) tags observations that are all present. 
> 
> . regress A B C D 
> . gen byte allpresent = e(sample) 
> 
> 7. Programmers might want to use commands specifically provided for this purpose. After something like 
> 
> . gen byte allpresent = 1 
> 
> either 
> 
> . markout allpresent <varlist> 
> 
> or 
> 
> . markout allpresent <varlist>, strok 
> 
> lets you tune your tagging. 
> 
> 8. This is undoubtedly not a complete list. 
> 
> Nick 
> [email protected] 
> 
> Mitch Abdon
> ===========
> 
> If you just need the number of observations with nonmissing ABC and D , try:
> 
> count if A !==. & B !==. & C !==. & D !==.
> 
> You can also generate a variable that will indicate if the observation
> has no missing values (1 if no missing values and 0 otherwise),
> example:
> 
> gen nomissing=(A !==. & B !==. & C !==. & D !==. )
> tab nomissing
> 
> Phil Clayton
> =============
> 
> If I'm understanding your question properly, you simply want to know the total number of observations (rows in your dataset) with complete data for the variables of interest? 
> 
> count if !missing(id, level, A, B, C, D)
> 
> Donald Spady
> ============
> 
> I have a dataset with 6 variables of interest: id level A B C D.  There are 100 individual id values, 24 individual level values and values for ABCD for each level of each id.  There are a lot of missing data.  How can I determine how many complete data sets I have; i.e. data sets of ID, Level, and A B C D values that are complete.  I have looked at misstable.  It is easy to determine the number of missing  A B C D data but when it comes to seeing how many complete sets of Level A B C D , I don't know what to do.
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 

Don Spady

Nature bats last.


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index