Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: svy subpop option and e(sample)

From	Steven Samuels <[email protected]>
To	[email protected]
Subject	Re: st: svy subpop option and e(sample)
Date	Fri, 27 May 2011 11:08:10 -0400

Hitesh

After reading  Section 5.4 of Korn and Graubard (1999), I return to Stas's advice: you need a good reason not to do the correct analysis. Here lack of memory won't be a reason,  for,as you have apparently surmised, you don't need to load the entire original data set. Instead create _one_ dummy observation for each PSU that contains no members of the sub-population. For this observation, set the value of all the analysis variables to zero or to some other convenient value. 

There is one more thing to do: in the -svyset- statement, use the -dof()- option to set the degrees of freedom to: number of PSUs with members of the subpopulation minus number of  strata with observations in the sub-population (Korn & Graubard, 1999, p. 209).

Ref: Korn, Edward Lee, and Barry I Graubard. 1999. Analysis of Health Surveys. New York: Wiley.

Steve
[email protected]

On May 27, 2011, at 12:20 AM, Hitesh Chandwani wrote:

Steve,

300,000 is not the number of PSUs. One PSU has multiple
observations...approximately 1800 PSUs account for the 300,000
observations.

These are nationwide hospital billing records data for 3 years. It is
a 20% stratified sample of state hospital data.

Also, the subpopulation is defined by characteristics of observations
within PSUs (more specifically, the observations are hospital events
related to a specific diagnosis).

So in the scenario I have presented, is 300,000 large enough?

Regards,
Hitesh

On Thu, May 26, 2011 at 10:40 PM, Steven Samuels <[email protected]> wrote:
> 
> Hitesh,
> 
> The relevant number would be the number of PSUs. If that is 300,000, I would think that it's much more than enough. If you don't mind my asking, what kind of sample had 75 million observations? I usually encounter numbers like that only in census data.
> 
> Steve
> [email protected]
> 
> 
> 
> Steve,
> 
> You said in an earlier message: For a large enough subpopulation, the
> correct standard error for the ratio is indistinguishable from the
> standard error that assumes that the sample size was fixed (Lohr,
> 2009, p. 135, shows the formula for a SRS).
> 
> How large is large enough? I am facing a similar problem. I extracted
> my subpopulation of interest and have 300,000 observations. My
> original data had 75 million observations with 61 variables. I cannot
> use the entire data due to insufficient RAM on my computer (I will
> need about 30-odd GB of RAM to analyze the data as a whole). I had to
> ask someone with access to such a powerful machine to extract the data
> for me.
> 
> If the standard errors for data this large are not going to be very
> biased, I can report the variance estimation issue as a limitation of
> the analysis. If the data are not large enough, then I will need to
> compute dummy variables for all PSUs not represented in the extracted
> data.
> 
> I would appreciate any help on the matter.
> 
> Regards,
> --
> Hitesh S. Chandwani
> University of Texas at Austin
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 

-- 
Hitesh S. Chandwani
University of Texas at Austin

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: svy subpop option and e(sample)
  - From: Richard Williams <[email protected]>

References:
- st: svy subpop option and e(sample)
  - From: Richard Williams <[email protected]>
- Re: st: svy subpop option and e(sample)
  - From: Steven Samuels <[email protected]>
- Re: st: svy subpop option and e(sample)
  - From: Richard Williams <[email protected]>
- Re: st: svy subpop option and e(sample)
  - From: Steven Samuels <[email protected]>
- Re: st: svy subpop option and e(sample)
  - From: Hitesh Chandwani <[email protected]>
- Re: st: svy subpop option and e(sample)
  - From: Steven Samuels <[email protected]>
- Re: st: svy subpop option and e(sample)
  - From: Hitesh Chandwani <[email protected]>

Prev by Date: Re: st: memory problem
Next by Date: Re: st: svy subpop option and e(sample)
Previous by thread: Re: st: svy subpop option and e(sample)
Next by thread: Re: st: svy subpop option and e(sample)
Index(es):
- Date
- Thread