Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: Confusion with Winsorizing |
Date | Wed, 15 Jan 2014 17:56:18 +0000 |
A more mundane problem to watch out for is the occurrence of ties. For example, if 5 + something% of values equal the smallest value, the smallest Winsorized value will occur with relative frequency 5 + something%, not 5%. Nick njcoxstata@gmail.com On 15 January 2014 17:44, Nick Cox <njcoxstata@gmail.com> wrote: > I would never overwrite existing data with Winsorized data. Doing that > may already have messed up results irretrievably if there is some > mistake in what you are doing. > > A smaller objection is that the local macros are unneeded here. > Assuming that the abbreviation -Postt- works, > > clonevar PostW = Postt > forvalues e=1/55 { > sum Postt if (Postt != 0 & Event`e' == 1) , de > replace PostW = r(p95) if (Event`e' ==1 & Postt > r(p95)) > replace PostW = r(p5) if (Event`e' ==1 & Postt < r(p5)) > } > > That still leaves your major question. An implicit assumption here is > that the values of 1 for -Event*- are disjoint, i.e. any value of such > a variable being 1 rules out the same for any other such variable. We > have no information on that from you. > > Nick > njcoxstata@gmail.com > > > On 15 January 2014 17:30, Nima Darbari <ramharz@gmail.com> wrote: >> I have written the simple code below to Winsorize a figure in 55 >> different events separately but perhaps due to a funny mistake it >> doesn't work properly. >> >> forvalues e=1(1)55{ >> sum PostturnoverFirm if (PostturnoverFirm !=0 & Event`e' ==1) , de >> local p95=r(p95) >> local p5=r(p5) >> replace PostturnoverFirm = `p95' if (Event`e' ==1 & PostturnoverFirm > `p95') >> replace PostturnoverFirm = `p5' if (Event`e' ==1 & PostturnoverFirm < `p5') >> } >> >> >> Bigger than the 95 percentile line works correctly but the smaller >> than 5 percentile line replaces the figure for almost all of the rest >> of observations. Does anyone know whats wrong with this? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/