[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

# st: different n's using "if"

 From "Sarah A. Mustillo" To statalist@hsphsun2.harvard.edu Subject st: different n's using "if" Date Thu, 31 Oct 2002 16:26:20 -0500

Hi -

This question is probably an easy one, but I am baffled...

I am trying to run GEE models for subsets of my sample separately: white girls, black girls, white boys, and black boys. I have been using an "if" statement before the comma in the regression model, e.g., if sex==0&race==1, etc. I should also mention that I am limiting the sample by time of observation as well, so what I really have is: if sex==0&race==1&period==1|2, for example. I started getting suspicious that I wasn't doing what I wanted to do when my sample size stayed large. So, I tried preserving the data set, the dropping male and black and running the model again, and the n was much smaller.

My question:

Shouldn't using "if" before the comma accomplish the same thing as dropping those people from the sample? What am I missing?
Below are my examples:

xi: xtgee pul per2Xlagccm period2 lagccm age if sex==0&racewh==1&period==1|2 [pweight=wt], robust corr(exch)

Iteration 1: tolerance = .00166807
Iteration 2: tolerance = 7.878e-11

GEE population-averaged model Number of obs = 2723
Group variable: id Number of groups = 390
Link: identity Obs per group: min = 6
Family: Gaussian avg = 7.0
Correlation: exchangeable max = 7
Wald chi2(4) = 150.68
Scale parameter: 210.9225 Prob > chi2 = 0.0000

(standard errors adjusted for clustering on id)
---------------------------------------------------------------------------
---
| Semi-robust
pul | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+-------------------------------------------------------------
---
per2Xlagccm | -1.332656 1.009294 -1.32 0.187 -3.310835 .645523
period2 | -5.228774 .6043163 -8.65 0.000 -6.413212 -4.044336
lagccm | -.0626779 2.276359 -0.03 0.978 -4.52426 4.398904
age | -.6751086 .3574791 -1.89 0.059 -1.375755 .0255375
_cons | 84.87411 1.566771 54.17 0.000 81.80329 87.94492
---------------------------------------------------------------------------
---

versus:

.keep if sex==0
.keep if race==1

xi: xtgee pul per2Xlagccm period2 lagccm age if period==1|2 [pweight=wt], robust corr(exch)

Iteration 1: tolerance = .00331609
Iteration 2: tolerance = 1.041e-09

GEE population-averaged model Number of obs = 468
Group variable: id Number of groups = 67
Link: identity Obs per group: min = 6
Family: Gaussian avg = 7.0
Correlation: exchangeable max = 7
Wald chi2(4) = 87.58
Scale parameter: 191.5843 Prob > chi2 = 0.0000

(standard errors adjusted for clustering on id)
---------------------------------------------------------------------------
---
| Semi-robust
pul | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+-------------------------------------------------------------
---
per2Xlagccm | -7.502376 2.768465 -2.71 0.007 -12.92847 -2.076284
period2 | -6.324729 .8571098 -7.38 0.000 -8.004634 -4.644825
lagccm | -.8279901 1.800555 -0.46 0.646 -4.357013 2.701033
age | -.3743277 .6457195 -0.58 0.562 -1.639915 .8912592
_cons | 87.05734 3.265192 26.66 0.000 80.65768 93.457
---------------------------------------------------------------------------
---

Thanks!

Sarah

Sarah A. Mustillo, Ph.D
Center for Developmental Epidemiology
Department of Psychiatry and Behavioral Sciences
Duke University School of Medicine
Box 3454
Durham NC 27710

919 687-4686 x234

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

 © Copyright 1996–2015 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index