Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: Mata: Calculating conditional expectation

 From Ivan Png To statalist@hsphsun2.harvard.edu Subject Re: st: Mata: Calculating conditional expectation Date Fri, 26 Apr 2013 13:02:47 +0800

```Many thanks for the suggestions.

On 25 April 2013 19:13, Nick Cox <njcoxstata@gmail.com> wrote:
> su Y_T if inlist(Y_T, 0, 1), meanonly
> scalar fp = r(mean)
> scalar fn = 1 - fp
> Nick
> njcoxstata@gmail.com
>
>
> On 25 April 2013 12:09, Nick Cox <njcoxstata@gmail.com> wrote:
>> In this there is a lot of putting constants into variables, only to
>> drop them later.
>>
>> In addition, whenever you want only one mean, -egen-'s -mean()-
>> function is overkill.
>>
>> If you want to handle constants, that usually calls for scalars, not variables.
>>
>> -summarize, meanonly- leaves r(mean) behind in memory, which you can
>> exploit more than you do.
>>
>> Your first block I would rewrite as
>>
>> count if Y == 1 & Y_T == 0
>> scalar fp = r(N)/_N
>> count if Y == 0 & Y_T == 1
>> scalar fn = r(N)/_N
>>
>> Indeed
>>
>> su Y_T if inlist(Y_T, 0, 1), meanonly
>> scalar fp = r(mean)
>> scalar fn = 1 - fppct
>>
>> is better technique, I suggest. It does not presume there are no
>> missing values on -Y_T-. (In addition, you calculated fractions, not
>> percents.)
>>
>> The block
>>
>> local dummies
>> forvalues i = 1/1000 {
>>      egen fpmean`i' = mean(id`i') if Y == 1 & Y_T == 0
>>      gen  idfp`i' = fppct * fpmean`i'
>>      local dummies "`dummies' idfp`i'"
>>      drop fpmean`i'
>> }
>>
>> could be simplified to
>>
>> local dummies
>> forvalues i = 1/1000 {
>>      su id`i' if Y == 1 & Y_T == 0, meanonly
>>      gen  idfp`i' = fppct * r(mean)
>>      local dummies "`dummies' idfp`i'"
>> }
>>
>> These are small changes. I've not tried to understand the whole of
>> what you are doing.
>>
>> Nick
>> njcoxstata@gmail.com
>>
>>
>> On 25 April 2013 11:22, Ivan Png <iplpng@gmail.com> wrote:
>>> Let me try to explain how I did it.  I did not use Mata to  direct
>>> computing the vector of conditional expectation of the explanatory
>>> variables.  Rather, I constructed the conditional expectation of each
>>> explanatory variable separately, and then collected them into the
>>> vector.  Not very elegant, but seems to work.
>>>
>>> == code fragment ==
>>>
>>> . count if Y == 1 & Y_T == 0  /* Y = observed Y; Y_T = true Y */
>>> . gen fppct = r(N)/total      /* total = no. of observations */
>>>                                         /* fppct = percent false positives */
>>> . count if Y == 0 & Y_T == 1
>>> . gen fnpct = r(N)/total      /* fppct = percent false negatives */
>>>
>>> . gen const = 1
>>>
>>>
>>> *** create X including dummy variables (fixed effects) ***
>>>
>>> . local dummies
>>> . forvalues i=1/1000 {
>>>     local dummies "`dummies' id`i'"
>>>     }
>>>
>>> . mata : st_view(X = .,., ("hhr", "const", "`dummies'"))  /* hhr =
>>> higher degree */
>>> . mata : rows(X), cols(X)  /* check matrix */
>>> . mata : X[|1,1 \ 7,7|]    /* check matrix for empty rows */
>>>
>>>
>>> ** construct conditional mean(X) **
>>> . sort lower year
>>>
>>> . foreach var in const hhr {
>>>    egen `var'_fpmean = mean(`var') if Y == 1 & Y_T == 0
>>>    gen  `var'_fp = fppct * `var'_fpmean
>>>    egen `var'_fnmean = mean(`var') if Y == 0 & Y_T == 1
>>>    gen  `var'_fn = fnpct * `var'_fnmean
>>>    drop `var'_fnmean
>>>    }
>>>
>>> ** construct P = mean(X) conditional on false positive **
>>> . local dummies
>>> . forvalues i = 1/1000 {
>>>     egen fpmean`i' = mean(id`i') if Y == 1 & Y_T == 0
>>>     gen  idfp`i' = fppct * fpmean`i'
>>>     local dummies "`dummies' idfp`i'"
>>>     drop fpmean`i'
>>>     }
>>>
>>>
>>> . mata : st_view(F = .,., ("hhr_fp", "const_fp", "`dummies'"))
>>> . mata : rows(F), cols(F)  /* check matrix */
>>> . mata : F[|1,1 \ 15,15|]    /* check matrix for empty rows */
>>>
>>> . mata : st_subview(P = . , F, 1 , .)  /* choose first non-empty row */
>>> . mata : rows(P), cols(P)
>>> . mata : P[|1,1 \ 1,20|]    /* check vector */
>>>
>>>
>>> ** construct N = mean(X) conditional on false negative **
>>> . sort lower year
>>>
>>> . local dummies
>>> . forvalues i = 1/1000 {
>>>     egen fnmean`i' = mean(id`i') if Y == 0 & Y_T == 1
>>>     gen  idfn`i' = fnpct * fnmean`i'
>>>     local dummies "`dummies' idfn`i'"
>>>     drop fnmean`i'
>>>     }
>>>
>>> . mata : st_view(G = .,., ("hhr_fn", "const_fn", "`dummies'"))
>>> . mata : rows(G), cols(G)  /* check matrix */
>>> . mata : G[|1,1 \ 20,20|]    /* check matrix for empty rows */
>>>
>>> . mata : st_subview(N = . , G, 3 , .)  /* choose first non-empty row */
>>> . mata : rows(N), cols(N)
>>> . mata : N[|1,1 \ 1,20|]    /* check vector */
>>>
>>>
>>> ** estimate bias **
>>> . mata : D = invsym(cross(X,X))*(P' - N')
>>> . mata : rows(D), cols(D)
>>> . mata : D[|1,1 \ 9,1|]    /* check vector */
>>>
>>> . mata : st_subview(E = . , D, (1::5) , .)  /* coefficients of focal
>>> variables */
>>> . mata : E
>>>
>>> . mata : H = 9898 * E      /* total = 9898 */
>>> . mata : H
>>>
>>> === end fragment ===
>>>
>>>
>>> On 23 April 2013 11:45, Ivan Png <iplpng@gmail.com> wrote:
>>>>
>>>> Many thanks to Statalist members for their previous help on
>>>> constructing the matrix.
>>>>
>>>> To recall, my dependent variable is categorical: Mobility = 1 if
>>>> inventor changed employer, else = 0.  I'm investigating the effect of
>>>> classification error.  Obviously, this cannot be classical.  Let Y =
>>>> true mobility and Z = recorded mobility. If Y = 0 and Z = 1, then
>>>> error = -1, while if Y = 1 and Z = 0, error = +1.
>>>>
>>>> Meyer and Mittag, U of Chicago (2012) characterize the bias as
>>>> N(X'X)^{-1} [ Pr( Y = 0 & Z = 1) E(X :  Y = 0 & Z = 1) - Pr(Y = 1 & Z
>>>> = 0) E(X :  Y = 1 & Z = 0) ].  I have a benchmark data set with both
>>>> the true and inaccurate mobility data, and would like to compute the
>>>> bias.
>>>>
>>>> So, I need to compute the conditional expectations, E(X :  Y = 0 & Z =
>>>> 1) and E(X :  Y = 1 & Z = 0), and then weight by the probabilities,
>>>> take the difference, and pre-multiply by N(X'X)^{-1}.  My idea:
>>>>
>>>> . keep if Y = 0 & Z = 1
>>>> . mata : F = mean(X)
>>>> . mata : mata matsave filename
>>>>
>>>> and repeat for Y = 1 & Z = 0.
>>>>
>>>> But, sadly, I don't how to proceed.  How to combine the original data
>>>> with the two new files containing the conditional expectations, and
>>>> then going back to MATA to calculate the bias.  Grateful to
>>>> Statalisters for help.
>>>>
>>>
>>>
>>>
>>> --
>>> Best wishes
>>> Ivan Png
>>> Skype: ipng00
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

--
Best wishes
Ivan Png
Skype: ipng00
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```