Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: Mata: Calculating conditional expectation |
Date | Thu, 25 Apr 2013 12:13:30 +0100 |
su Y_T if inlist(Y_T, 0, 1), meanonly scalar fp = r(mean) scalar fn = 1 - fp Nick njcoxstata@gmail.com On 25 April 2013 12:09, Nick Cox <njcoxstata@gmail.com> wrote: > In this there is a lot of putting constants into variables, only to > drop them later. > > In addition, whenever you want only one mean, -egen-'s -mean()- > function is overkill. > > If you want to handle constants, that usually calls for scalars, not variables. > > -summarize, meanonly- leaves r(mean) behind in memory, which you can > exploit more than you do. > > Your first block I would rewrite as > > count if Y == 1 & Y_T == 0 > scalar fp = r(N)/_N > count if Y == 0 & Y_T == 1 > scalar fn = r(N)/_N > > Indeed > > su Y_T if inlist(Y_T, 0, 1), meanonly > scalar fp = r(mean) > scalar fn = 1 - fppct > > is better technique, I suggest. It does not presume there are no > missing values on -Y_T-. (In addition, you calculated fractions, not > percents.) > > The block > > local dummies > forvalues i = 1/1000 { > egen fpmean`i' = mean(id`i') if Y == 1 & Y_T == 0 > gen idfp`i' = fppct * fpmean`i' > local dummies "`dummies' idfp`i'" > drop fpmean`i' > } > > could be simplified to > > local dummies > forvalues i = 1/1000 { > su id`i' if Y == 1 & Y_T == 0, meanonly > gen idfp`i' = fppct * r(mean) > local dummies "`dummies' idfp`i'" > } > > These are small changes. I've not tried to understand the whole of > what you are doing. > > Nick > njcoxstata@gmail.com > > > On 25 April 2013 11:22, Ivan Png <iplpng@gmail.com> wrote: >> Let me try to explain how I did it. I did not use Mata to direct >> computing the vector of conditional expectation of the explanatory >> variables. Rather, I constructed the conditional expectation of each >> explanatory variable separately, and then collected them into the >> vector. Not very elegant, but seems to work. >> >> == code fragment == >> >> . count if Y == 1 & Y_T == 0 /* Y = observed Y; Y_T = true Y */ >> . gen fppct = r(N)/total /* total = no. of observations */ >> /* fppct = percent false positives */ >> . count if Y == 0 & Y_T == 1 >> . gen fnpct = r(N)/total /* fppct = percent false negatives */ >> >> . gen const = 1 >> >> >> *** create X including dummy variables (fixed effects) *** >> >> . local dummies >> . forvalues i=1/1000 { >> local dummies "`dummies' id`i'" >> } >> >> . mata : st_view(X = .,., ("hhr", "const", "`dummies'")) /* hhr = >> higher degree */ >> . mata : rows(X), cols(X) /* check matrix */ >> . mata : X[|1,1 \ 7,7|] /* check matrix for empty rows */ >> >> >> ** construct conditional mean(X) ** >> . sort lower year >> >> . foreach var in const hhr { >> egen `var'_fpmean = mean(`var') if Y == 1 & Y_T == 0 >> gen `var'_fp = fppct * `var'_fpmean >> egen `var'_fnmean = mean(`var') if Y == 0 & Y_T == 1 >> gen `var'_fn = fnpct * `var'_fnmean >> drop `var'_fnmean >> } >> >> ** construct P = mean(X) conditional on false positive ** >> . local dummies >> . forvalues i = 1/1000 { >> egen fpmean`i' = mean(id`i') if Y == 1 & Y_T == 0 >> gen idfp`i' = fppct * fpmean`i' >> local dummies "`dummies' idfp`i'" >> drop fpmean`i' >> } >> >> >> . mata : st_view(F = .,., ("hhr_fp", "const_fp", "`dummies'")) >> . mata : rows(F), cols(F) /* check matrix */ >> . mata : F[|1,1 \ 15,15|] /* check matrix for empty rows */ >> >> . mata : st_subview(P = . , F, 1 , .) /* choose first non-empty row */ >> . mata : rows(P), cols(P) >> . mata : P[|1,1 \ 1,20|] /* check vector */ >> >> >> ** construct N = mean(X) conditional on false negative ** >> . sort lower year >> >> . local dummies >> . forvalues i = 1/1000 { >> egen fnmean`i' = mean(id`i') if Y == 0 & Y_T == 1 >> gen idfn`i' = fnpct * fnmean`i' >> local dummies "`dummies' idfn`i'" >> drop fnmean`i' >> } >> >> . mata : st_view(G = .,., ("hhr_fn", "const_fn", "`dummies'")) >> . mata : rows(G), cols(G) /* check matrix */ >> . mata : G[|1,1 \ 20,20|] /* check matrix for empty rows */ >> >> . mata : st_subview(N = . , G, 3 , .) /* choose first non-empty row */ >> . mata : rows(N), cols(N) >> . mata : N[|1,1 \ 1,20|] /* check vector */ >> >> >> ** estimate bias ** >> . mata : D = invsym(cross(X,X))*(P' - N') >> . mata : rows(D), cols(D) >> . mata : D[|1,1 \ 9,1|] /* check vector */ >> >> . mata : st_subview(E = . , D, (1::5) , .) /* coefficients of focal >> variables */ >> . mata : E >> >> . mata : H = 9898 * E /* total = 9898 */ >> . mata : H >> >> === end fragment === >> >> >> On 23 April 2013 11:45, Ivan Png <iplpng@gmail.com> wrote: >>> >>> Many thanks to Statalist members for their previous help on >>> constructing the matrix. >>> >>> To recall, my dependent variable is categorical: Mobility = 1 if >>> inventor changed employer, else = 0. I'm investigating the effect of >>> classification error. Obviously, this cannot be classical. Let Y = >>> true mobility and Z = recorded mobility. If Y = 0 and Z = 1, then >>> error = -1, while if Y = 1 and Z = 0, error = +1. >>> >>> Meyer and Mittag, U of Chicago (2012) characterize the bias as >>> N(X'X)^{-1} [ Pr( Y = 0 & Z = 1) E(X : Y = 0 & Z = 1) - Pr(Y = 1 & Z >>> = 0) E(X : Y = 1 & Z = 0) ]. I have a benchmark data set with both >>> the true and inaccurate mobility data, and would like to compute the >>> bias. >>> >>> So, I need to compute the conditional expectations, E(X : Y = 0 & Z = >>> 1) and E(X : Y = 1 & Z = 0), and then weight by the probabilities, >>> take the difference, and pre-multiply by N(X'X)^{-1}. My idea: >>> >>> . keep if Y = 0 & Z = 1 >>> . mata : F = mean(X) >>> . mata : mata matsave filename >>> >>> and repeat for Y = 1 & Z = 0. >>> >>> But, sadly, I don't how to proceed. How to combine the original data >>> with the two new files containing the conditional expectations, and >>> then going back to MATA to calculate the bias. Grateful to >>> Statalisters for help. >>> >> >> >> >> -- >> Best wishes >> Ivan Png >> Skype: ipng00 >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/faqs/resources/statalist-faq/ >> * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/