# Re: st: Difference in Difference for Proportions

 From Misha Spisok To statalist@hsphsun2.harvard.edu Subject Re: st: Difference in Difference for Proportions Date Tue, 22 Sep 2009 14:49:40 -0700

Many Thanks, Austin and Jeph!

The Norton, Wang, and Ai SJ article was very informative.  Also, the
code examples clarified some things and, of course, raised more
questions.

If it is not kosher to post follow-up questions on the same thread,
please let me know and I will re-post as new questions.  Otherwise, my
follow-up questions are below.

The short version is, what's the difference between -blogit- and
-logit-?  Or, more accurately, in the context of grouped data, which
standard error estimate is correct?

If, after using Austin's example, I run the following:

logit union t south txsouth [pw=f]

and

blogit y pop t south txsouth

I get, as expected (or hoped, in my case), the same coefficients.  The
standard errors are smaller in -blogit- because, as I might
understand, -blogit- is considering pop to be the number of
observations per row, so the number of "effective" observations is the
sum of pop.

I think this explains the difference in the standard errors.
Specifically, with some minor adjustment for the "robustified" -logit-
standard errors, the relationship between -logit- and -blogit-
standard errors is something like the following:

s_blogit = sqrt(s_logit^2*(n_logit - k)/(n_blogit - k))

where s_blogit is the se from -blogit-, s_logit is the se from
-logit-, n_logit is the number of observation from -logit-, n_blogit
is the number of observations from -blogit-, and k is the number of
dependent variables, including the constant.

It strikes me that the standard errors from -blogit- are more
reasonable, given the actual number of observations that lie behind
the summarized data.  Thus, it seems that the standard errors from
using -inteff- will be as incorrect as those from -logit- for
summarized data.  While I could use the formula from Ai and Norton
(2003) to calculate the standard error for the interaction term using
the variance-covariance matrix returned after -blogit-, would this be
making a mistake?

My data are not survey data.  They are "actual" data, in the sense
that f is the true number of people with the condition and pop is the
true population.

Thanks again,

Misha
(Using Stata 10.1)

On Fri, Sep 18, 2009 at 7:00 AM, Jeph Herrin <junk@spandrel.net> wrote:
>
> Thanks Austin,
>
> Yes, I should have specified the -rd- option, I meant
> the linear link function. I've become a fan of using
> binary (and binomial) linear regression for testing
> hypotheses.
>
> cheers,
> Jeph
>
>
> Austin Nichols wrote:
>>
>> Jeph--
>> Doesn't the interaction problem discussed in
>> http://www.stata-journal.com/sjpdf.html?articlenum=st0063
>> also rear its ugly head here?
>>
>> Probably also have to be careful of SEs--if the total populations are
>> summed weights from a survey, significance will likely be overstated.
>>
>> I'd probably go to -svy:tab- first in that case...
>>
>> sysuse psidextract, clear
>> keep if t>5
>> set seed 1
>> g f=ceil(uniform()*1000)
>> egen pop=total(f), by(south t)
>> svyset [pw=f], strata(t)
>> egen gp=group(t south), lab
>> svy:tab gp union if t>5, row ci
>> lincom _b[p42]-_b[p22]-(_b[p32]-_b[p12])
>> g txsouth=t*south
>> egen y=total(union*f), by(gp)
>> bys gp: replace y=. if _n<_N
>> li y t south pop if y<.
>> binreg y t south txsouth, n(pop)
>> binreg union t south txsouth [pw=f]
>> logit union t south txsouth [pw=f]
>> findit inteff
>>
>> On Thu, Sep 17, 2009 at 4:53 PM, Jeph Herrin <junk@spandrel.net> wrote:
>>>
>>> Not sure whether this helps you, but I would normally test this
>>> with an interaction term in a model. For instance
>>>
>>>  gen txsouth=t*south
>>>  binreg f t south txsouth, n(pop)
>>>
>>> Then testing the coefficient on -txsouth- is the same as
>>> testing whether there is a significant difference in differences.
>>>
>>> hth,
>>> Jeph
>>>
>>> Misha Spisok wrote:
>>>>
>>>> Hello, Statalist,
>>>>
>>>> In brief, how does one test a difference in difference of proportions?
>>>>  My question is re-stated briefly at the end with reference to the
>>>> variables I present.  A formula and/or reference would be appreciated
>>>> if no command exists.
>>>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/