Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Donald Spady <dspady@ualberta.ca> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: detecting a complete data set |

Date |
Tue, 16 Nov 2010 06:31:46 -0700 |

I guess it really helps to define the question explicitly, and I was too vague. Here is a simple example id level A B C D 1 1 1 1 1 1 1 2 1 0 . 0 1 3 . 1 0 . 1 4 1 0 1 1 2 1 1 0 1 1 2 2 0 0 0 1 2 3 0 1 1 1 2 4 1 1 1 1 and so on I want to detect the ID that has no data missing for ALL Levels within that ID. In this case ID 2 fits the bill, ID 1 has data missing in levels 2 and 3 There are NO missing values for id or level. They are basically placeholders. I know how to find missing for A B C D; what I don't know is how to detect those IDs where there is some missing (A B C D) data in at least 1 of the levels, OR, putting it the other way, I need to know the IDs where there are values for A B C D for every level. Thanks Don On 2010-11-16, at 4:41 AM, Nick Cox wrote: > I want just to add a few notes comparing these solutions and mentioning some others. > > 1. Mitch's solution contains typos, as != not !== indicates not equal to: > > count if A !=. & B !=. & C !=. & D !=. > > 2. Beyond that, Mitch's solution differs from Phil's, as it won't exclude extended numeric missing values .a to .z. It could be extended to exclude all numeric missings by changing != to < and to exclude string missings by adding conditions of the form E != "" for a string variable E. > > 3. Phil's solution of working with -missing(A,B,C,D)- and its negation -!missing(A, B, C, D)- is a good solution for a small or moderate number of variables. Beyond that, writing out a long comma-separated varlist is a little tedious and error-prone. > > 4. Moreover, -missing()- happily takes a mixture of numeric and string arguments. > > 5. For many variables you could use -egen-'s -rowmiss()- function which uses -missing()- internally to create a new variable counting missings in observations. The advantage of that it takes varlists, including variable ranges and wildcards. A value of 0 for the resulting variable means all present. > > 6. Some people have learned the trick of throwing a set of variables at -regress- which naturally will only accept complete observations on the variables specified. After the regression e(sample) tags observations that are all present. > > . regress A B C D > . gen byte allpresent = e(sample) > > 7. Programmers might want to use commands specifically provided for this purpose. After something like > > . gen byte allpresent = 1 > > either > > . markout allpresent <varlist> > > or > > . markout allpresent <varlist>, strok > > lets you tune your tagging. > > 8. This is undoubtedly not a complete list. > > Nick > n.j.cox@durham.ac.uk > > Mitch Abdon > =========== > > If you just need the number of observations with nonmissing ABC and D , try: > > count if A !==. & B !==. & C !==. & D !==. > > You can also generate a variable that will indicate if the observation > has no missing values (1 if no missing values and 0 otherwise), > example: > > gen nomissing=(A !==. & B !==. & C !==. & D !==. ) > tab nomissing > > Phil Clayton > ============= > > If I'm understanding your question properly, you simply want to know the total number of observations (rows in your dataset) with complete data for the variables of interest? > > count if !missing(id, level, A, B, C, D) > > Donald Spady > ============ > > I have a dataset with 6 variables of interest: id level A B C D. There are 100 individual id values, 24 individual level values and values for ABCD for each level of each id. There are a lot of missing data. How can I determine how many complete data sets I have; i.e. data sets of ID, Level, and A B C D values that are complete. I have looked at misstable. It is easy to determine the number of missing A B C D data but when it comes to seeing how many complete sets of Level A B C D , I don't know what to do. > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > Don Spady Nature bats last. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: detecting a complete data set***From:*Maarten buis <maartenbuis@yahoo.co.uk>

**References**:**st: detecting a complete data set***From:*Donald Spady <dspady@ualberta.ca>

**Re: st: detecting a complete data set***From:*Mitch Abdon <mitchaabdon@gmail.com>

**RE: st: detecting a complete data set***From:*Nick Cox <n.j.cox@durham.ac.uk>

- Prev by Date:
**Re: st: Dependent variable in a tobit model** - Next by Date:
**st: Still no news about quantiles SE estimation for complex survey with Woodruff method?** - Previous by thread:
**RE: st: detecting a complete data set** - Next by thread:
**Re: st: detecting a complete data set** - Index(es):