JIBONAYAN RAYCHAUDHURI <jibonayanrc@yahoo.com>

statalist@hsphsun2.harvard.edu

st: Fw: Standardizing values

Mon, 24 Aug 2009 12:46:45 -0700 (PDT)

Hi, The reason I cannot use -std()- with egen is because it is not by-able and I need the standarization by nic2. When I do the standardization overall, i.e., over the entire sample of firms I am indeed using egen with std().In doing so, once again one measure of TFP, the one using OLS, shows a near 1 correlation but this measure of TFP does not. In any case, once I standarize the mean of the TFP measure should be zero instead of .54. Also, if I use the stand. by nic2 the other OLS measure of TFP as I mentioned in my earlier mail gives a correlation of 1.000 which is very near to the correlation between the non-std values with the overall std. values for the OLS measure, but not for this measure. J --- On Mon, 8/24/09, JIBONAYAN RAYCHAUDHURI <jibonayanrc@yahoo.com> wrote: > From: JIBONAYAN RAYCHAUDHURI <jibonayanrc@yahoo.com> > Subject: Standardizing values > To: statalist@hsphsun2.harvard.edu > Date: Monday, August 24, 2009, 11:39 AM > Hi Stata Users, > > I am trying to calculate total factor productivity (TFP) > for a panel of firms. I have these firms classified by > industry. I have a measure of TFP (tfplevpet_imp, which > contains imputed values) that I am trying to standarize at > 2-digit industrial classification (called nic2 that ranges > from 15 to 35 with some gaps) . I am using the following > code to do so: > > #delimit; > gen logtfplevpet_mean_imp=.; > gen logtfplevpet_sd_imp=.; > #delimit; > gsort +nic2 +newyear; > #delimit; > foreach num of numlist 15 17 19 21 23 24 25 26 27 29 30 > 31 32 35 {; > by nic2 : egen logtfplevpet_mean_imp`num' = > mean(logtfplevpet_imp) if nic2==`num'; > by nic2 : egen logtfplevpet_sd_imp`num' = > sd(logtfplevpet_imp) if nic2==`num'; > }; > #delimit; > foreach num of numlist 15 17 19 21 23 24 25 26 27 29 30 > 31 32 35 {; > replace logtfplevpet_mean_imp=logtfplevpet_mean_imp`num' if > nic2==`num'; > replace logtfplevpet_sd_imp=logtfplevpet_sd_imp`num' > if nic2==`num'; > }; > > #delimit; > gsort +compname +newyear; > #delimit; > gen logtfplevpet_stand_imp= > logtfplevpet_imp-logtfplevpet_mean_imp/logtfplevpet_sd_imp; > > The correlation between the standarized and the > non-standarized values is very low about 0.25. Also, the > mean of this measure is .54. This measure of TFP is using a > semi-parametric estimation technique. In another measure of > TFP, which I get as the OLS residual in a simple regression, > if I use the exact same code the correlation between std. > and non std. values is 0.99!!! Also, instead of standarizing > at 2-digit nic if I do a standardization over the > entire sample, i..e., std. values are now computed from the > overall mean and variance of all firms in all industries, > the TFP measure shows a mean of nearly 0 and an sd of 1, but > the correlation with the non-std. measure is still low 0.71. > As a sidenote, when I impute values I am using only those > nic observations that are used in the standarization. I am > very puzzled as to why the correlation is so low between > standard and nonstandard values, when it should always be > close to 1. Any comments suggestion will > be highly appreciated. > > Jibonayan > > > > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam > protection around > http://mail.yahoo.com > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

