[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Weights

From   "Austin Nichols" <>
Subject   Re: st: Weights
Date   Wed, 30 Apr 2008 11:26:50 -0400

Martin Weiss <>

SPSS is using the wrong type of weight, and therefore will give you
incorrect standard errors.  See -help weights- and -help svy- and the
manuals for more.

Perhaps the large size of the Stata file is due to all variables being
stored as doubles?  Try -compress- on an extract and see -help

Note that -mean- restricts to obs where all vars are nonmissing, so
instead of e.g.

ds, has(type numeric)
loc num `r(varlist)'
mean `num'


ds, has(type numeric)
loc num `r(varlist)'
foreach v of loc num {
 mean `v'

or just use -summarize- with aweights or pweights
(pweights=aweights+_robust so point estimates are identical, but
variance estimates differ).

On Wed, Apr 30, 2008 at 10:57 AM, Martin Weiss
<> wrote:
> Dear Statalisters,
> can anybody give me a clue as to the array of weighting options in Stata? I
> have an important project where I would really like to make headway...
> My dataset features a size of 2.4 GB as .csv. When I translate this into
> SPSS, it ends up with 2.7 GB while the equivalent Stata dataset has 5.5 GB
> (!). Anyway, I usually pick out the interesting variables beforehand because
> Stata is unable to open the entire dataset. The first column of the data
> contains samplingweights. The dataprovider ships a pdf with the descriptives
> for the marginal distributions of the variables in the population so I know
> the true values.
> Now here lies the rub: when I weight -summarize- with analytic weights, the
> approximately correct mean and standard deviation pop out. When I let Stata
> estimate the mean with the -mean- command, with analytic weights attached in
> the same fashion, I get widely differing results for the point estimate of
> the mean, far from the true values. In SPSS, I simply go to -weight cases-
> and everything comes out correct.
> Do I have to -svyset- the data? When I try to -frequency weight- the data,
> Stata complains that non-integers are not allowed while SPSS seems to not
> quarrel with them. Why is it that SPSS needs one command at the beginning of
> the session while Stata has a (differing) tab dedicated to weighting for
> every single command?
*   For searches and help try:

© Copyright 1996–2015 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index