Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Difference in Difference for Proportions


From   Austin Nichols <austinnichols@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Difference in Difference for Proportions
Date   Tue, 22 Sep 2009 20:25:41 -0400

Misha Spisok <misha.spisok@gmail.com> :
If the variable f records the true number of observations with that
covariate pattern, then
logit union t south txsouth [fw=f]
would be the right code (see -help weight-).

You can also use the -svy- commands I outlined, starting with the command
svyset, srs
or
svyset _n
to declare the data as not coming from a complex survey.

On Tue, Sep 22, 2009 at 5:49 PM, Misha Spisok <misha.spisok@gmail.com> wrote:
> Many Thanks, Austin and Jeph!
>
> The Norton, Wang, and Ai SJ article was very informative.  Also, the
> code examples clarified some things and, of course, raised more
> questions.
>
> If it is not kosher to post follow-up questions on the same thread,
> please let me know and I will re-post as new questions.  Otherwise, my
> follow-up questions are below.
>
> The short version is, what's the difference between -blogit- and
> -logit-?  Or, more accurately, in the context of grouped data, which
> standard error estimate is correct?
>
> If, after using Austin's example, I run the following:
>
> logit union t south txsouth [pw=f]
>
> and
>
> blogit y pop t south txsouth
>
> I get, as expected (or hoped, in my case), the same coefficients.  The
> standard errors are smaller in -blogit- because, as I might
> understand, -blogit- is considering pop to be the number of
> observations per row, so the number of "effective" observations is the
> sum of pop.
>
> I think this explains the difference in the standard errors.
> Specifically, with some minor adjustment for the "robustified" -logit-
> standard errors, the relationship between -logit- and -blogit-
> standard errors is something like the following:
>
> s_blogit = sqrt(s_logit^2*(n_logit - k)/(n_blogit - k))
>
> where s_blogit is the se from -blogit-, s_logit is the se from
> -logit-, n_logit is the number of observation from -logit-, n_blogit
> is the number of observations from -blogit-, and k is the number of
> dependent variables, including the constant.
>
> It strikes me that the standard errors from -blogit- are more
> reasonable, given the actual number of observations that lie behind
> the summarized data.  Thus, it seems that the standard errors from
> using -inteff- will be as incorrect as those from -logit- for
> summarized data.  While I could use the formula from Ai and Norton
> (2003) to calculate the standard error for the interaction term using
> the variance-covariance matrix returned after -blogit-, would this be
> making a mistake?
>
> My data are not survey data.  They are "actual" data, in the sense
> that f is the true number of people with the condition and pop is the
> true population.
>
> Thanks again,
>
> Misha
> (Using Stata 10.1)
>
>
> On Fri, Sep 18, 2009 at 7:00 AM, Jeph Herrin <junk@spandrel.net> wrote:
>>
>> Thanks Austin,
>>
>> Yes, I should have specified the -rd- option, I meant
>> the linear link function. I've become a fan of using
>> binary (and binomial) linear regression for testing
>> hypotheses.
>>
>> cheers,
>> Jeph
>>
>>
>> Austin Nichols wrote:
>>>
>>> Jeph--
>>> Doesn't the interaction problem discussed in
>>> http://www.stata-journal.com/sjpdf.html?articlenum=st0063
>>> also rear its ugly head here?
>>>
>>> Probably also have to be careful of SEs--if the total populations are
>>> summed weights from a survey, significance will likely be overstated.
>>>
>>> I'd probably go to -svy:tab- first in that case...
>>>
>>> sysuse psidextract, clear
>>> keep if t>5
>>> set seed 1
>>> g f=ceil(uniform()*1000)
>>> egen pop=total(f), by(south t)
>>> svyset [pw=f], strata(t)
>>> egen gp=group(t south), lab
>>> svy:tab gp union if t>5, row ci
>>> lincom _b[p42]-_b[p22]-(_b[p32]-_b[p12])
>>> g txsouth=t*south
>>> egen y=total(union*f), by(gp)
>>> bys gp: replace y=. if _n<_N
>>> li y t south pop if y<.
>>> binreg y t south txsouth, n(pop)
>>> binreg union t south txsouth [pw=f]
>>> logit union t south txsouth [pw=f]
>>> findit inteff
>>>
>>> On Thu, Sep 17, 2009 at 4:53 PM, Jeph Herrin <junk@spandrel.net> wrote:
>>>>
>>>> Not sure whether this helps you, but I would normally test this
>>>> with an interaction term in a model. For instance
>>>>
>>>>  gen txsouth=t*south
>>>>  binreg f t south txsouth, n(pop)
>>>>
>>>> Then testing the coefficient on -txsouth- is the same as
>>>> testing whether there is a significant difference in differences.
>>>>
>>>> hth,
>>>> Jeph
>>>>
>>>> Misha Spisok wrote:
>>>>>
>>>>> Hello, Statalist,
>>>>>
>>>>> In brief, how does one test a difference in difference of proportions?
>>>>>  My question is re-stated briefly at the end with reference to the
>>>>> variables I present.  A formula and/or reference would be appreciated
>>>>> if no command exists.
>>>>>
>>>
>>>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index