Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Richard Williams <richardwilliams.ndu@gmail.com> |

To |
statalist@hsphsun2.harvard.edu, statalist@hsphsun2.harvard.edu |

Subject |
Re: st: svy subpop option and e(sample) |

Date |
Wed, 25 May 2011 09:56:30 -0500 |

At 06:20 PM 5/24/2011, Steven Samuels wrote:

Just to elaborate: with sub-populations, the ratio estimator of amean with every sample member in numerator and denominator isnecessary because the sample size of the subpopulation is random,not fixed. This extends to the regression estimators, as they arefunctions of means. If you had use an -if- qualifier to restrictthe analysis to black==1, e(sample) would work as you expect; theestimates would be the same; but the standard errors would be different.Steve sjsamuels@gmail.comThis is expected behavior, Richard. Everyone contributes to thestandard error, whether in the sub-population or not. For example,with n = 3 and X1 X2 in the supopulation and X3 not, let Z = 1 forthose in the subpopulation, 0 if not.Then the mean of X is estimated by (W1 + W2 + W3/(Z1 + Z2 + Z3), aratio estimate, with Wi = Xi*Zi and Variation in the mean ismeasured by variation between W's, which includes W3 =0.Steve On May 24, 2011, at 4:02 PM, Richard Williams wrote:I've just noticed that the e(sample) option does not work the way Iexpect it to when using svy and the subpop option. Specifically,e(sample) codes everyone in the population as 1, whether they werein the subpopulation specified or not. I guess I can sort of kind ofsee a rationale for doing this (the whole population is used tocompute the standard errors) but it has the potential to screw upyour post-estimation analysis if you only wanted to do things with(what you thought) was the subpopulation you expected.The following illustrates this. There are only 1086 cases in thesubpopulation selected, but probabilities are computed for all10,000 cases. That is, coefficients computed using only the blacksubpopulation are used to compute probabilities for the entire population:. webuse nhanes2f, clear . svy, subpop(black): ologit health age female (running ologit on estimation sample) Survey: Ordered logistic regression Number of strata = 30 Number of obs = 10000 Number of PSUs = 60 Population size = 113285074 Subpop. no. of obs = 1086 Subpop. size = 11189236 Design df = 30 F( 2, 29) = 29.87 Prob > F = 0.0000 ------------------------------------------------------------------------------ | Linearized health | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | -.0452349 .0063482 -7.13 0.000 -.0581996 -.0322703 female | -.3975887 .1336441 -2.97 0.006 -.6705263 -.1246511 -------------+---------------------------------------------------------------- /cut1 | -4.427029 .2976634 -14.87 0.000 -5.034939 -3.819119 /cut2 | -2.97326 .2848889 -10.44 0.000 -3.555081 -2.391439 /cut3 | -1.347426 .2497407 -5.40 0.000 -1.857465 -.8373876 /cut4 | -.214417 .2857434 -0.75 0.459 -.7979829 .3691488 ------------------------------------------------------------------------------ Note: 1 stratum omitted because it contains no subpopulation members. . predict p1 p2 p3 p4 p5 if e(sample) (option pr assumed; predicted probabilities) (337 missing values generated) . sum p1 p2 p3 p4 p5 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- p1 | 10000 .1362895 .0853605 .0286835 .3358028 p2 | 10000 .2341256 .0876537 .0835067 .3482424 p3 | 10000 .3367498 .0391877 .2327486 .3854614 p4 | 10000 .1639109 .0712129 .0549047 .2749383 p5 | 10000 .1289243 .0859123 .0284552 .3339704 I think one solution is to change the predict command to something like predict p51 p52 p53 p54 p55 if e(sample) & black predict p61 p62 p63 p64 p65 if e(sample) & `=e(subpop)'But, are there others? Preferably simpler ones? And is this goodbehavior for e(sample) in the first place?------------------------------------------- Richard Williams, Notre Dame Dept of Sociology OFFICE: (574)631-6668, (574)631-6463 HOME: (574)289-5227 EMAIL: Richard.A.Williams.5@ND.Edu WWW: http://www.nd.edu/~rwilliam * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

------------------------------------------- Richard Williams, Notre Dame Dept of Sociology OFFICE: (574)631-6668, (574)631-6463 HOME: (574)289-5227 EMAIL: Richard.A.Williams.5@ND.Edu WWW: http://www.nd.edu/~rwilliam * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: svy subpop option and e(sample)***From:*Richard Williams <richardwilliams.ndu@gmail.com>

**Re: st: svy subpop option and e(sample)***From:*Steven Samuels <sjsamuels@gmail.com>

- Prev by Date:
**Re: st: Local Linear Regression for Regression Discontinuity Designs** - Next by Date:
**Re: st: svy subpop option and e(sample)** - Previous by thread:
**Re: st: svy subpop option and e(sample)** - Next by thread:
**Re: st: svy subpop option and e(sample)** - Index(es):