Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: St: Dropping variables with mostly missing values


From   Nick Cox <[email protected]>
To   "[email protected]" <[email protected]>
Subject   Re: st: St: Dropping variables with mostly missing values
Date   Sat, 8 Feb 2014 01:08:01 +0000

However, -nmissing- and -npresent- (also SJ) allow you to go something like

npresent, min(20)

or even something like

npresent, min(`=ceil(_N/5)')

after which

keep `r(varlist)'

would -keep- what you wanted. Alternatively, use -nmissing- to count
missings and -drop- unwanted variables afterwards. The help indicates
that

"A question by Eric Uslaner led to the addition of r(varlist) as a
saved result."

and a search flags http://www.stata.com/statalist/archive/2005-02/msg00297.html

so this question comes nicely full circle. Thanks again, Eric, for
provoking that addition.

Nick
[email protected]

On 8 February 2014 00:00, Nick Cox <[email protected]> wrote:

> Good solutions to this came from Jeph Herrin, Amirsa and Richard Goldstein.
>
> Meanwhile, anyone interested in -dropmiss-, which doesn't do this,
> should please note that it comes from the Stata Journal, not SSC.

On 7 February 2014 20:40, Jeph Herrin <[email protected]> wrote:

>> To drop all variables missing more than 80% of the time:
>>
>> foreach V of varlist _all {
>>         count if !mi(`V')
>>         drop if r(N)/_N < 0.2
>> }
>>
>>
>> This works for string and numeric variables. Change 0.2 to whatever level
>> you want.

On 2/7/2014 3:11 PM, Eric M. Uslaner wrote:

>>> I know that this has been discussed before, but a long search doesn't find
>>> a solution for me (my own fault in searching, most likely).
>>>
>>> I have a data set (not my own) with 161 cases over a long time period.
>>> But  most of the variables are largely made up of missing values
>>> (information wasn't available a long time ago).  I have used Nick Cox's
>>> dropmiss (from SSC) to drop variables with all missing values.  But a large
>>> number of variables remain with few observations.  I would like to delete
>>> any variable with fewer than 20 cases.  But I can't figure out how to do
>>> this (especially since I have a large number of variables, most of which
>>> have very few cases).  Any help would be appreciated.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index