Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: St: Dropping variables with mostly missing values |
Date | Sat, 8 Feb 2014 01:08:01 +0000 |
However, -nmissing- and -npresent- (also SJ) allow you to go something like npresent, min(20) or even something like npresent, min(`=ceil(_N/5)') after which keep `r(varlist)' would -keep- what you wanted. Alternatively, use -nmissing- to count missings and -drop- unwanted variables afterwards. The help indicates that "A question by Eric Uslaner led to the addition of r(varlist) as a saved result." and a search flags http://www.stata.com/statalist/archive/2005-02/msg00297.html so this question comes nicely full circle. Thanks again, Eric, for provoking that addition. Nick njcoxstata@gmail.com On 8 February 2014 00:00, Nick Cox <njcoxstata@gmail.com> wrote: > Good solutions to this came from Jeph Herrin, Amirsa and Richard Goldstein. > > Meanwhile, anyone interested in -dropmiss-, which doesn't do this, > should please note that it comes from the Stata Journal, not SSC. On 7 February 2014 20:40, Jeph Herrin <info@flyingbuttress.net> wrote: >> To drop all variables missing more than 80% of the time: >> >> foreach V of varlist _all { >> count if !mi(`V') >> drop if r(N)/_N < 0.2 >> } >> >> >> This works for string and numeric variables. Change 0.2 to whatever level >> you want. On 2/7/2014 3:11 PM, Eric M. Uslaner wrote: >>> I know that this has been discussed before, but a long search doesn't find >>> a solution for me (my own fault in searching, most likely). >>> >>> I have a data set (not my own) with 161 cases over a long time period. >>> But most of the variables are largely made up of missing values >>> (information wasn't available a long time ago). I have used Nick Cox's >>> dropmiss (from SSC) to drop variables with all missing values. But a large >>> number of variables remain with few observations. I would like to delete >>> any variable with fewer than 20 cases. But I can't figure out how to do >>> this (especially since I have a large number of variables, most of which >>> have very few cases). Any help would be appreciated. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/