Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: svy subpop option and e(sample)


From   Austin Nichols <austinnichols@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: svy subpop option and e(sample)
Date   Fri, 27 May 2011 15:51:34 -0400

Steve--
Just do MI on the complete dataset, wind up with 20 times as much
data, or 100 times as much, then sample from that larger dataset.
OK, maybe that does not help make the dataset small and manageable
after all...

It *is* feasible to use 30-some GB of data, though maybe not on one's
laptop--time to find a better computer!

On Fri, May 27, 2011 at 3:37 PM, Steven Samuels <sjsamuels@gmail.com> wrote:
>
> Austin--
>
> Like Richard, I forgot about your post and about the need to pool singleton strata. Your "better" estimation procedure is a complete solution.
>
> In Hitesh's case, keeping all data in memory isn't feasible. For dealing with missing data, what do you think about MI restricted to the subpopulation?
>
> Steve
> sjsamuels@gmail.com
>
> On May 27, 2011, at 12:13 PM, Austin Nichols wrote:
>
> Richard--
> I claimed in http://www.stata.com/statalist/archive/2007-11/msg00810.html
> that "It is tempting to write a -svysubset- package
> to automate this subsetting procedure, but for any given model, the
> pattern of missing values might be different, which means the
> automatic-subsetting package could offer no savings in general over
> keeping all the data in memory."  Maybe a bit strong, but the general point is
> that the ad hoc solution is not straightforward to generalize in the presence
> of missing data.
>
> On Fri, May 27, 2011 at 12:25 PM, Richard Williams
> <richardwilliams.ndu@gmail.com> wrote:
>> At 10:08 AM 5/27/2011, Steven Samuels wrote:
>>>
>>> Hitesh
>>>
>>> After reading  Section 5.4 of Korn and Graubard (1999), I return to Stas's
>>> advice: you need a good reason not to do the correct analysis. Here lack of
>>> memory won't be a reason,  for,as you have apparently surmised, you don't
>>> need to load the entire original data set. Instead create _one_ dummy
>>> observation for each PSU that contains no members of the sub-population. For
>>> this observation, set the value of all the analysis variables to zero or to
>>> some other convenient value.
>>
>> Interesting. Would it be fairly straightforward to create an -svyextract-
>> command then? It seems like such a command could be quite useful for those
>> who would otherwise have to deal with massive data sets. Maybe even add a
>> property to the svysettings so the dof would be right when analyzing the
>> extract. This might be a good wish list item for Stata 12.
>>
>>> There is one more thing to do: in the -svyset- statement, use the -dof()-
>>> option to set the degrees of freedom to: number of PSUs with members of the
>>> subpopulation minus number of  strata with observations in the
>>> sub-population (Korn & Graubard, 1999, p. 209).
>>>
>>> Ref: Korn, Edward Lee, and Barry I Graubard. 1999. Analysis of Health
>>> Surveys. New York: Wiley.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index