[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Difference in Difference for Proportions

From   Austin Nichols <[email protected]>
To   [email protected]
Subject   Re: st: Difference in Difference for Proportions
Date   Tue, 22 Sep 2009 20:25:41 -0400

Misha Spisok <[email protected]> :
If the variable f records the true number of observations with that
covariate pattern, then
logit union t south txsouth [fw=f]
would be the right code (see -help weight-).

You can also use the -svy- commands I outlined, starting with the command
svyset, srs
svyset _n
to declare the data as not coming from a complex survey.

On Tue, Sep 22, 2009 at 5:49 PM, Misha Spisok <[email protected]> wrote:
> Many Thanks, Austin and Jeph!
> The Norton, Wang, and Ai SJ article was very informative.  Also, the
> code examples clarified some things and, of course, raised more
> questions.
> If it is not kosher to post follow-up questions on the same thread,
> please let me know and I will re-post as new questions.  Otherwise, my
> follow-up questions are below.
> The short version is, what's the difference between -blogit- and
> -logit-?  Or, more accurately, in the context of grouped data, which
> standard error estimate is correct?
> If, after using Austin's example, I run the following:
> logit union t south txsouth [pw=f]
> and
> blogit y pop t south txsouth
> I get, as expected (or hoped, in my case), the same coefficients.  The
> standard errors are smaller in -blogit- because, as I might
> understand, -blogit- is considering pop to be the number of
> observations per row, so the number of "effective" observations is the
> sum of pop.
> I think this explains the difference in the standard errors.
> Specifically, with some minor adjustment for the "robustified" -logit-
> standard errors, the relationship between -logit- and -blogit-
> standard errors is something like the following:
> s_blogit = sqrt(s_logit^2*(n_logit - k)/(n_blogit - k))
> where s_blogit is the se from -blogit-, s_logit is the se from
> -logit-, n_logit is the number of observation from -logit-, n_blogit
> is the number of observations from -blogit-, and k is the number of
> dependent variables, including the constant.
> It strikes me that the standard errors from -blogit- are more
> reasonable, given the actual number of observations that lie behind
> the summarized data.  Thus, it seems that the standard errors from
> using -inteff- will be as incorrect as those from -logit- for
> summarized data.  While I could use the formula from Ai and Norton
> (2003) to calculate the standard error for the interaction term using
> the variance-covariance matrix returned after -blogit-, would this be
> making a mistake?
> My data are not survey data.  They are "actual" data, in the sense
> that f is the true number of people with the condition and pop is the
> true population.
> Thanks again,
> Misha
> (Using Stata 10.1)
> On Fri, Sep 18, 2009 at 7:00 AM, Jeph Herrin <[email protected]> wrote:
>> Thanks Austin,
>> Yes, I should have specified the -rd- option, I meant
>> the linear link function. I've become a fan of using
>> binary (and binomial) linear regression for testing
>> hypotheses.
>> cheers,
>> Jeph
>> Austin Nichols wrote:
>>> Jeph--
>>> Doesn't the interaction problem discussed in
>>> also rear its ugly head here?
>>> Probably also have to be careful of SEs--if the total populations are
>>> summed weights from a survey, significance will likely be overstated.
>>> I'd probably go to -svy:tab- first in that case...
>>> sysuse psidextract, clear
>>> keep if t>5
>>> set seed 1
>>> g f=ceil(uniform()*1000)
>>> egen pop=total(f), by(south t)
>>> svyset [pw=f], strata(t)
>>> egen gp=group(t south), lab
>>> svy:tab gp union if t>5, row ci
>>> lincom _b[p42]-_b[p22]-(_b[p32]-_b[p12])
>>> g txsouth=t*south
>>> egen y=total(union*f), by(gp)
>>> bys gp: replace y=. if _n<_N
>>> li y t south pop if y<.
>>> binreg y t south txsouth, n(pop)
>>> binreg union t south txsouth [pw=f]
>>> logit union t south txsouth [pw=f]
>>> findit inteff
>>> On Thu, Sep 17, 2009 at 4:53 PM, Jeph Herrin <[email protected]> wrote:
>>>> Not sure whether this helps you, but I would normally test this
>>>> with an interaction term in a model. For instance
>>>>  gen txsouth=t*south
>>>>  binreg f t south txsouth, n(pop)
>>>> Then testing the coefficient on -txsouth- is the same as
>>>> testing whether there is a significant difference in differences.
>>>> hth,
>>>> Jeph
>>>> Misha Spisok wrote:
>>>>> Hello, Statalist,
>>>>> In brief, how does one test a difference in difference of proportions?
>>>>>  My question is re-stated briefly at the end with reference to the
>>>>> variables I present.  A formula and/or reference would be appreciated
>>>>> if no command exists.

*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index