Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | Richard Goldstein <richgold@ix.netcom.com> |
Subject | Re: st: RE: programatically dropping variables that don't actually vary |
Date | Thu, 9 Aug 2012 20:21:01 +0100 |
The title of the original post says that variables shouldn't vary and the content says that means in practice all zero. Evidently Jennifer or anybody else with a similar problem will need to tweak the recipe according to the exact real problem. By the way, -findname- can find variables with no variation at all by findname, all(@ == @[1]) but that does not ignore missings that are not equal to the first value. Nick On Thu, Aug 9, 2012 at 8:16 PM, Richard Goldstein <richgold@ix.netcom.com> wrote: > > actually, I think that what is wanted is "if r(min)==r(max)" if one > wants a general test for lack of variation (or, of course, "if r(sd)==0") > > Rich > > On 8/9/12 3:13 PM, Nick Cox wrote: >> for "and" read "&" >> >> On Thu, Aug 9, 2012 at 8:12 PM, Nick Cox <njcoxstata@gmail.com> wrote: >>> In principle, many variables could have mean 0. A safer test is that >>> >>> if r(min) == 0 and r(max) == 0 >>> >>> Nick >>> >>> On Thu, Aug 9, 2012 at 8:03 PM, Sarah Edgington <sedging@ucla.edu> wrote: >>>> Jenn, >>>> There are a variety of ways you might want to do this. What I would do is >>>> something like the following: >>>> >>>> foreach var of varlist dummy1-dummyn { >>>> sum `var', meanonly >>>> if r(mean)==0 { >>>> drop `var' >>>> } >>>> } >>>> >>>> This cycles through each of your variables (substitute your actual variable >>>> list for "dummy1-dummyn"). For reach variable it calculates the mean. The >>>> drop statement in the if loop only gets executed if the value stored in >>>> r(mean) is 0. >>>> >>>> -Sarah >>>> >>>> >>>> -----Original Message----- >>>> From: owner-statalist@hsphsun2.harvard.edu >>>> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Earl, Jennifer >>>> Suzanne - (jenniferearl) >>>> Sent: Thursday, August 09, 2012 11:46 AM >>>> To: statalist@hsphsun2.harvard.edu >>>> Subject: st: programatically dropping variables that don't actually vary >>>> >>>> Hi, >>>> >>>> I am working with a large number of dummy variables and using collapse to >>>> create derivative datasets that are the frequencies of 1's for each dummy >>>> variable (a couple of hundred through foreach loops). I want to drop any of >>>> the dummy variables that never had a 1 (so mean(dummy1)==0, or >>>> max(dummy)==0) but it seems that drop only lets you use an if statement to >>>> drop observations, but not an if statement to drop variables. >>>> >>>> My best guess is to use a list means to create a list of the variable names >>>> that can be stored in a local and then fed into a drop command, but can't >>>> seem to make that work either since I only want the list of variable names >>>> that have a mean of 0. Or maybe transpose the dataset, drop then since the >>>> variables are now observations, and transpose back? Another solution would >>>> be save through StatTansfer and use it's drop constants feature, and then >>>> bring the data back in, but there must be an easier way. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/