Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Weights


From   "Austin Nichols" <austinnichols@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Weights
Date   Wed, 30 Apr 2008 11:26:50 -0400

Martin Weiss <martin.weiss@uni-tuebingen.de>

SPSS is using the wrong type of weight, and therefore will give you
incorrect standard errors.  See -help weights- and -help svy- and the
manuals for more.

Perhaps the large size of the Stata file is due to all variables being
stored as doubles?  Try -compress- on an extract and see -help
datatypes-.

Note that -mean- restricts to obs where all vars are nonmissing, so
instead of e.g.

ds, has(type numeric)
loc num `r(varlist)'
mean `num'

try

ds, has(type numeric)
loc num `r(varlist)'
foreach v of loc num {
 mean `v'
}

or just use -summarize- with aweights or pweights
(pweights=aweights+_robust so point estimates are identical, but
variance estimates differ).

On Wed, Apr 30, 2008 at 10:57 AM, Martin Weiss
<martin.weiss@uni-tuebingen.de> wrote:
> Dear Statalisters,
>
> can anybody give me a clue as to the array of weighting options in Stata? I
> have an important project where I would really like to make headway...
>
> My dataset features a size of 2.4 GB as .csv. When I translate this into
> SPSS, it ends up with 2.7 GB while the equivalent Stata dataset has 5.5 GB
> (!). Anyway, I usually pick out the interesting variables beforehand because
> Stata is unable to open the entire dataset. The first column of the data
> contains samplingweights. The dataprovider ships a pdf with the descriptives
> for the marginal distributions of the variables in the population so I know
> the true values.
>
> Now here lies the rub: when I weight -summarize- with analytic weights, the
> approximately correct mean and standard deviation pop out. When I let Stata
> estimate the mean with the -mean- command, with analytic weights attached in
> the same fashion, I get widely differing results for the point estimate of
> the mean, far from the true values. In SPSS, I simply go to -weight cases-
> and everything comes out correct.
>
> Do I have to -svyset- the data? When I try to -frequency weight- the data,
> Stata complains that non-integers are not allowed while SPSS seems to not
> quarrel with them. Why is it that SPSS needs one command at the beginning of
> the session while Stata has a (differing) tab dedicated to weighting for
> every single command?
>
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index