Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Steven Samuels <sjsamuels@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: svy subpop option and e(sample) |

Date |
Fri, 27 May 2011 11:08:10 -0400 |

Hitesh After reading Section 5.4 of Korn and Graubard (1999), I return to Stas's advice: you need a good reason not to do the correct analysis. Here lack of memory won't be a reason, for,as you have apparently surmised, you don't need to load the entire original data set. Instead create _one_ dummy observation for each PSU that contains no members of the sub-population. For this observation, set the value of all the analysis variables to zero or to some other convenient value. There is one more thing to do: in the -svyset- statement, use the -dof()- option to set the degrees of freedom to: number of PSUs with members of the subpopulation minus number of strata with observations in the sub-population (Korn & Graubard, 1999, p. 209). Ref: Korn, Edward Lee, and Barry I Graubard. 1999. Analysis of Health Surveys. New York: Wiley. Steve sjsamuels@gmail.com On May 27, 2011, at 12:20 AM, Hitesh Chandwani wrote: Steve, 300,000 is not the number of PSUs. One PSU has multiple observations...approximately 1800 PSUs account for the 300,000 observations. These are nationwide hospital billing records data for 3 years. It is a 20% stratified sample of state hospital data. Also, the subpopulation is defined by characteristics of observations within PSUs (more specifically, the observations are hospital events related to a specific diagnosis). So in the scenario I have presented, is 300,000 large enough? Regards, Hitesh On Thu, May 26, 2011 at 10:40 PM, Steven Samuels <sjsamuels@gmail.com> wrote: > > Hitesh, > > The relevant number would be the number of PSUs. If that is 300,000, I would think that it's much more than enough. If you don't mind my asking, what kind of sample had 75 million observations? I usually encounter numbers like that only in census data. > > Steve > sjsamuels@gmail.com > > > > Steve, > > You said in an earlier message: For a large enough subpopulation, the > correct standard error for the ratio is indistinguishable from the > standard error that assumes that the sample size was fixed (Lohr, > 2009, p. 135, shows the formula for a SRS). > > How large is large enough? I am facing a similar problem. I extracted > my subpopulation of interest and have 300,000 observations. My > original data had 75 million observations with 61 variables. I cannot > use the entire data due to insufficient RAM on my computer (I will > need about 30-odd GB of RAM to analyze the data as a whole). I had to > ask someone with access to such a powerful machine to extract the data > for me. > > If the standard errors for data this large are not going to be very > biased, I can report the variance estimation issue as a limitation of > the analysis. If the data are not large enough, then I will need to > compute dummy variables for all PSUs not represented in the extracted > data. > > I would appreciate any help on the matter. > > Regards, > -- > Hitesh S. Chandwani > University of Texas at Austin > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > -- Hitesh S. Chandwani University of Texas at Austin * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: svy subpop option and e(sample)***From:*Richard Williams <richardwilliams.ndu@gmail.com>

**References**:**st: svy subpop option and e(sample)***From:*Richard Williams <richardwilliams.ndu@gmail.com>

**Re: st: svy subpop option and e(sample)***From:*Steven Samuels <sjsamuels@gmail.com>

**Re: st: svy subpop option and e(sample)***From:*Richard Williams <richardwilliams.ndu@gmail.com>

**Re: st: svy subpop option and e(sample)***From:*Steven Samuels <sjsamuels@gmail.com>

**Re: st: svy subpop option and e(sample)***From:*Hitesh Chandwani <hchandwani.stata@gmail.com>

**Re: st: svy subpop option and e(sample)***From:*Steven Samuels <sjsamuels@gmail.com>

**Re: st: svy subpop option and e(sample)***From:*Hitesh Chandwani <hchandwani.stata@gmail.com>

- Prev by Date:
**Re: st: memory problem** - Next by Date:
**Re: st: svy subpop option and e(sample)** - Previous by thread:
**Re: st: svy subpop option and e(sample)** - Next by thread:
**Re: st: svy subpop option and e(sample)** - Index(es):