[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Austin Nichols <austinnichols@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Difference in Difference for Proportions |

Date |
Tue, 22 Sep 2009 20:25:41 -0400 |

Misha Spisok <misha.spisok@gmail.com> : If the variable f records the true number of observations with that covariate pattern, then logit union t south txsouth [fw=f] would be the right code (see -help weight-). You can also use the -svy- commands I outlined, starting with the command svyset, srs or svyset _n to declare the data as not coming from a complex survey. On Tue, Sep 22, 2009 at 5:49 PM, Misha Spisok <misha.spisok@gmail.com> wrote: > Many Thanks, Austin and Jeph! > > The Norton, Wang, and Ai SJ article was very informative. Also, the > code examples clarified some things and, of course, raised more > questions. > > If it is not kosher to post follow-up questions on the same thread, > please let me know and I will re-post as new questions. Otherwise, my > follow-up questions are below. > > The short version is, what's the difference between -blogit- and > -logit-? Or, more accurately, in the context of grouped data, which > standard error estimate is correct? > > If, after using Austin's example, I run the following: > > logit union t south txsouth [pw=f] > > and > > blogit y pop t south txsouth > > I get, as expected (or hoped, in my case), the same coefficients. The > standard errors are smaller in -blogit- because, as I might > understand, -blogit- is considering pop to be the number of > observations per row, so the number of "effective" observations is the > sum of pop. > > I think this explains the difference in the standard errors. > Specifically, with some minor adjustment for the "robustified" -logit- > standard errors, the relationship between -logit- and -blogit- > standard errors is something like the following: > > s_blogit = sqrt(s_logit^2*(n_logit - k)/(n_blogit - k)) > > where s_blogit is the se from -blogit-, s_logit is the se from > -logit-, n_logit is the number of observation from -logit-, n_blogit > is the number of observations from -blogit-, and k is the number of > dependent variables, including the constant. > > It strikes me that the standard errors from -blogit- are more > reasonable, given the actual number of observations that lie behind > the summarized data. Thus, it seems that the standard errors from > using -inteff- will be as incorrect as those from -logit- for > summarized data. While I could use the formula from Ai and Norton > (2003) to calculate the standard error for the interaction term using > the variance-covariance matrix returned after -blogit-, would this be > making a mistake? > > My data are not survey data. They are "actual" data, in the sense > that f is the true number of people with the condition and pop is the > true population. > > Thanks again, > > Misha > (Using Stata 10.1) > > > On Fri, Sep 18, 2009 at 7:00 AM, Jeph Herrin <junk@spandrel.net> wrote: >> >> Thanks Austin, >> >> Yes, I should have specified the -rd- option, I meant >> the linear link function. I've become a fan of using >> binary (and binomial) linear regression for testing >> hypotheses. >> >> cheers, >> Jeph >> >> >> Austin Nichols wrote: >>> >>> Jeph-- >>> Doesn't the interaction problem discussed in >>> http://www.stata-journal.com/sjpdf.html?articlenum=st0063 >>> also rear its ugly head here? >>> >>> Probably also have to be careful of SEs--if the total populations are >>> summed weights from a survey, significance will likely be overstated. >>> >>> I'd probably go to -svy:tab- first in that case... >>> >>> sysuse psidextract, clear >>> keep if t>5 >>> set seed 1 >>> g f=ceil(uniform()*1000) >>> egen pop=total(f), by(south t) >>> svyset [pw=f], strata(t) >>> egen gp=group(t south), lab >>> svy:tab gp union if t>5, row ci >>> lincom _b[p42]-_b[p22]-(_b[p32]-_b[p12]) >>> g txsouth=t*south >>> egen y=total(union*f), by(gp) >>> bys gp: replace y=. if _n<_N >>> li y t south pop if y<. >>> binreg y t south txsouth, n(pop) >>> binreg union t south txsouth [pw=f] >>> logit union t south txsouth [pw=f] >>> findit inteff >>> >>> On Thu, Sep 17, 2009 at 4:53 PM, Jeph Herrin <junk@spandrel.net> wrote: >>>> >>>> Not sure whether this helps you, but I would normally test this >>>> with an interaction term in a model. For instance >>>> >>>> gen txsouth=t*south >>>> binreg f t south txsouth, n(pop) >>>> >>>> Then testing the coefficient on -txsouth- is the same as >>>> testing whether there is a significant difference in differences. >>>> >>>> hth, >>>> Jeph >>>> >>>> Misha Spisok wrote: >>>>> >>>>> Hello, Statalist, >>>>> >>>>> In brief, how does one test a difference in difference of proportions? >>>>> My question is re-stated briefly at the end with reference to the >>>>> variables I present. A formula and/or reference would be appreciated >>>>> if no command exists. >>>>> >>> >>> * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Difference in Difference for Proportions***From:*Misha Spisok <misha.spisok@gmail.com>

**References**:**st: Difference in Difference for Proportions***From:*Misha Spisok <misha.spisok@gmail.com>

**Re: st: Difference in Difference for Proportions***From:*Jeph Herrin <junk@spandrel.net>

**Re: st: Difference in Difference for Proportions***From:*Austin Nichols <austinnichols@gmail.com>

**Re: st: Difference in Difference for Proportions***From:*Jeph Herrin <junk@spandrel.net>

**Re: st: Difference in Difference for Proportions***From:*Misha Spisok <misha.spisok@gmail.com>

- Prev by Date:
**Re: st: RE: -graph bar- label problem** - Next by Date:
**st: label in -graph bar-** - Previous by thread:
**Re: st: Difference in Difference for Proportions** - Next by thread:
**Re: st: Difference in Difference for Proportions** - Index(es):

© Copyright 1996–2019 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |