Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Re: RE: Re: RE: RE: IQR


From   "Rodrigo A. Alfaro" <[email protected]>
To   <[email protected]>
Subject   st: Re: RE: Re: RE: RE: IQR
Date   Thu, 7 Jun 2007 09:16:55 -0400

It seems to be a 'common' practice when COMPUSTAT
data is used. The dataset is composed by the balance sheet
reports of US firms. It would be difficult to identify in the data mergers, splits or any sort of change in property that implies a huge change in the composicion of a firm (in terms of assets, fixed capital, etc.) then dropping extreme values in change in assets allows you to 'delete' the unexplained firms. Also, a similar problem affects the price where sometime a change in the dividend policy can produce a jump that makes sense only when the researcher knows the change in policy. Usually, researchers do not know about these policies or it is a titatic (and maybe useless) job trying to include them in the analysis.

Rodrigo.

----- Original Message ----- From: "Nick Cox" <[email protected]>
To: <[email protected]>
Sent: Thursday, June 07, 2007 6:44 AM
Subject: st: RE: Re: RE: RE: IQR



I am shocked to find my good friend Kit Baum throwing away 20% of his data. No doubt this profligacy matches his research problem. In environmental science,
which I know more about, throwing out the tails would lose all the bangs and leave
mostly whimpers, but he is doing economics, where some of the extreme values may represent accountancy artefacts.
On -iqr-, since half the work is done, perhaps there is
a case for a formal update. I will contact the author, Larry Hamilton, whose book's various editions have served so many Stata users so well. (It got me started.)
But -iqr-'s main function I see as reporting. Rodrigo's example of a -foreach- loop cycling over variables and -summarize- results is the way to go for selection of subsets of data.
Nick [email protected]
Rodrigo A. Alfaro


///
Wow Nick, your translation from 'demotic' can be only compared with the work of Thomas Young. Just kidding, very good job indeed!! Returning to the problem, it would be nice to get a list of return scalars in your new version. For the problem, the limits were the observations are supposed to be outliers can be used after for sample selection or to create new variables.

Alternative to the procedure discussed so far, there is another way to 'deal' with the outlier (if you want to), which is cutting the tails "we trimmed firms whose total assets growth rate exceed the 90th percentile or fall short of the 10th percentile of the annual distribution." page 6 of Baum, Caglayan, Ozkan (2003), Working Paper 566, Boston College. For example, tdavis could use the following code to 'drop' the outliers that are above of 5th and 95th percentile of each variable:

foreach x of varlist price total_assets inventories {
gen double `x'_wo = `x'
sum `x', d
local u = r(p95)
local l = r(p5)
replace `x'_wo = . if `x'>`u'
replace `x'_wo = . if `x'<`l'
}

From: "Nick Cox" <[email protected]>
To: <[email protected]>
Sent: Wednesday, June 06, 2007 6:53 PM
Subject: st: RE: RE: IQR


>I spent a while updating -iqr- to -iqr8-.
>
> This was unnecessary, because -iqr- works
> fine under version control. (How many programs
> would run without change in other software after 16 years?)
> Nevertheless, few Stata users will now be accustomed to reading
> or writing Stata like this:
>
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index