Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Ivan Png <iplpng@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Mata: Calculating conditional expectation |

Date |
Fri, 26 Apr 2013 13:02:47 +0800 |

Many thanks for the suggestions. On 25 April 2013 19:13, Nick Cox <njcoxstata@gmail.com> wrote: > su Y_T if inlist(Y_T, 0, 1), meanonly > scalar fp = r(mean) > scalar fn = 1 - fp > Nick > njcoxstata@gmail.com > > > On 25 April 2013 12:09, Nick Cox <njcoxstata@gmail.com> wrote: >> In this there is a lot of putting constants into variables, only to >> drop them later. >> >> In addition, whenever you want only one mean, -egen-'s -mean()- >> function is overkill. >> >> If you want to handle constants, that usually calls for scalars, not variables. >> >> -summarize, meanonly- leaves r(mean) behind in memory, which you can >> exploit more than you do. >> >> Your first block I would rewrite as >> >> count if Y == 1 & Y_T == 0 >> scalar fp = r(N)/_N >> count if Y == 0 & Y_T == 1 >> scalar fn = r(N)/_N >> >> Indeed >> >> su Y_T if inlist(Y_T, 0, 1), meanonly >> scalar fp = r(mean) >> scalar fn = 1 - fppct >> >> is better technique, I suggest. It does not presume there are no >> missing values on -Y_T-. (In addition, you calculated fractions, not >> percents.) >> >> The block >> >> local dummies >> forvalues i = 1/1000 { >> egen fpmean`i' = mean(id`i') if Y == 1 & Y_T == 0 >> gen idfp`i' = fppct * fpmean`i' >> local dummies "`dummies' idfp`i'" >> drop fpmean`i' >> } >> >> could be simplified to >> >> local dummies >> forvalues i = 1/1000 { >> su id`i' if Y == 1 & Y_T == 0, meanonly >> gen idfp`i' = fppct * r(mean) >> local dummies "`dummies' idfp`i'" >> } >> >> These are small changes. I've not tried to understand the whole of >> what you are doing. >> >> Nick >> njcoxstata@gmail.com >> >> >> On 25 April 2013 11:22, Ivan Png <iplpng@gmail.com> wrote: >>> Let me try to explain how I did it. I did not use Mata to direct >>> computing the vector of conditional expectation of the explanatory >>> variables. Rather, I constructed the conditional expectation of each >>> explanatory variable separately, and then collected them into the >>> vector. Not very elegant, but seems to work. >>> >>> == code fragment == >>> >>> . count if Y == 1 & Y_T == 0 /* Y = observed Y; Y_T = true Y */ >>> . gen fppct = r(N)/total /* total = no. of observations */ >>> /* fppct = percent false positives */ >>> . count if Y == 0 & Y_T == 1 >>> . gen fnpct = r(N)/total /* fppct = percent false negatives */ >>> >>> . gen const = 1 >>> >>> >>> *** create X including dummy variables (fixed effects) *** >>> >>> . local dummies >>> . forvalues i=1/1000 { >>> local dummies "`dummies' id`i'" >>> } >>> >>> . mata : st_view(X = .,., ("hhr", "const", "`dummies'")) /* hhr = >>> higher degree */ >>> . mata : rows(X), cols(X) /* check matrix */ >>> . mata : X[|1,1 \ 7,7|] /* check matrix for empty rows */ >>> >>> >>> ** construct conditional mean(X) ** >>> . sort lower year >>> >>> . foreach var in const hhr { >>> egen `var'_fpmean = mean(`var') if Y == 1 & Y_T == 0 >>> gen `var'_fp = fppct * `var'_fpmean >>> egen `var'_fnmean = mean(`var') if Y == 0 & Y_T == 1 >>> gen `var'_fn = fnpct * `var'_fnmean >>> drop `var'_fnmean >>> } >>> >>> ** construct P = mean(X) conditional on false positive ** >>> . local dummies >>> . forvalues i = 1/1000 { >>> egen fpmean`i' = mean(id`i') if Y == 1 & Y_T == 0 >>> gen idfp`i' = fppct * fpmean`i' >>> local dummies "`dummies' idfp`i'" >>> drop fpmean`i' >>> } >>> >>> >>> . mata : st_view(F = .,., ("hhr_fp", "const_fp", "`dummies'")) >>> . mata : rows(F), cols(F) /* check matrix */ >>> . mata : F[|1,1 \ 15,15|] /* check matrix for empty rows */ >>> >>> . mata : st_subview(P = . , F, 1 , .) /* choose first non-empty row */ >>> . mata : rows(P), cols(P) >>> . mata : P[|1,1 \ 1,20|] /* check vector */ >>> >>> >>> ** construct N = mean(X) conditional on false negative ** >>> . sort lower year >>> >>> . local dummies >>> . forvalues i = 1/1000 { >>> egen fnmean`i' = mean(id`i') if Y == 0 & Y_T == 1 >>> gen idfn`i' = fnpct * fnmean`i' >>> local dummies "`dummies' idfn`i'" >>> drop fnmean`i' >>> } >>> >>> . mata : st_view(G = .,., ("hhr_fn", "const_fn", "`dummies'")) >>> . mata : rows(G), cols(G) /* check matrix */ >>> . mata : G[|1,1 \ 20,20|] /* check matrix for empty rows */ >>> >>> . mata : st_subview(N = . , G, 3 , .) /* choose first non-empty row */ >>> . mata : rows(N), cols(N) >>> . mata : N[|1,1 \ 1,20|] /* check vector */ >>> >>> >>> ** estimate bias ** >>> . mata : D = invsym(cross(X,X))*(P' - N') >>> . mata : rows(D), cols(D) >>> . mata : D[|1,1 \ 9,1|] /* check vector */ >>> >>> . mata : st_subview(E = . , D, (1::5) , .) /* coefficients of focal >>> variables */ >>> . mata : E >>> >>> . mata : H = 9898 * E /* total = 9898 */ >>> . mata : H >>> >>> === end fragment === >>> >>> >>> On 23 April 2013 11:45, Ivan Png <iplpng@gmail.com> wrote: >>>> >>>> Many thanks to Statalist members for their previous help on >>>> constructing the matrix. >>>> >>>> To recall, my dependent variable is categorical: Mobility = 1 if >>>> inventor changed employer, else = 0. I'm investigating the effect of >>>> classification error. Obviously, this cannot be classical. Let Y = >>>> true mobility and Z = recorded mobility. If Y = 0 and Z = 1, then >>>> error = -1, while if Y = 1 and Z = 0, error = +1. >>>> >>>> Meyer and Mittag, U of Chicago (2012) characterize the bias as >>>> N(X'X)^{-1} [ Pr( Y = 0 & Z = 1) E(X : Y = 0 & Z = 1) - Pr(Y = 1 & Z >>>> = 0) E(X : Y = 1 & Z = 0) ]. I have a benchmark data set with both >>>> the true and inaccurate mobility data, and would like to compute the >>>> bias. >>>> >>>> So, I need to compute the conditional expectations, E(X : Y = 0 & Z = >>>> 1) and E(X : Y = 1 & Z = 0), and then weight by the probabilities, >>>> take the difference, and pre-multiply by N(X'X)^{-1}. My idea: >>>> >>>> . keep if Y = 0 & Z = 1 >>>> . mata : F = mean(X) >>>> . mata : mata matsave filename >>>> >>>> and repeat for Y = 1 & Z = 0. >>>> >>>> But, sadly, I don't how to proceed. How to combine the original data >>>> with the two new files containing the conditional expectations, and >>>> then going back to MATA to calculate the bias. Grateful to >>>> Statalisters for help. >>>> >>> >>> >>> >>> -- >>> Best wishes >>> Ivan Png >>> Skype: ipng00 >>> * >>> * For searches and help try: >>> * http://www.stata.com/help.cgi?search >>> * http://www.stata.com/support/faqs/resources/statalist-faq/ >>> * http://www.ats.ucla.edu/stat/stata/ > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ -- Best wishes Ivan Png Skype: ipng00 * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Mata: Calculating conditional expectation***From:*Ivan Png <iplpng@gmail.com>

**st: Mata: Calculating conditional expectation***From:*Ivan Png <iplpng@gmail.com>

**Re: st: Mata: Calculating conditional expectation***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: Mata: Calculating conditional expectation***From:*Nick Cox <njcoxstata@gmail.com>

- Prev by Date:
**Re: st: Interpretation of Cut Points in Ordered Probit (Logit) Model** - Next by Date:
**Re: : st: Fwd: denton command** - Previous by thread:
**Re: st: Mata: Calculating conditional expectation** - Next by thread:
**st: Mata: Calculating conditional expectation** - Index(es):