[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: Weights

From   "Martin Weiss" <>
To   <>
Subject   RE: st: Weights
Date   Wed, 30 Apr 2008 17:41:47 +0200


if only I could open the file and compress it... I have the latest gear in
terms of hard- and software (MP/2 10.0 64 bit, 4GB RAM, Vista Business 64
bit, ...) but it is next to impossible to open the 5.5 GB file. Virtual mem
makes things so slow it takes all the fun out of it... So I am stuck in a
bit of a quandary.

Martin Weiss

Diplom-Kaufmann Martin Weiss
Mohlstrasse 36
Room 415
72074 Tuebingen

Fon: 0049-7071-2978184




-----Original Message-----
[] On Behalf Of Austin Nichols
Sent: Wednesday, April 30, 2008 5:27 PM
Subject: Re: st: Weights

Martin Weiss <>

SPSS is using the wrong type of weight, and therefore will give you
incorrect standard errors.  See -help weights- and -help svy- and the
manuals for more.

Perhaps the large size of the Stata file is due to all variables being
stored as doubles?  Try -compress- on an extract and see -help

Note that -mean- restricts to obs where all vars are nonmissing, so
instead of e.g.

ds, has(type numeric)
loc num `r(varlist)'
mean `num'


ds, has(type numeric)
loc num `r(varlist)'
foreach v of loc num {
 mean `v'

or just use -summarize- with aweights or pweights
(pweights=aweights+_robust so point estimates are identical, but
variance estimates differ).

On Wed, Apr 30, 2008 at 10:57 AM, Martin Weiss
<> wrote:
> Dear Statalisters,
> can anybody give me a clue as to the array of weighting options in Stata?
> have an important project where I would really like to make headway...
> My dataset features a size of 2.4 GB as .csv. When I translate this into
> SPSS, it ends up with 2.7 GB while the equivalent Stata dataset has 5.5 GB
> (!). Anyway, I usually pick out the interesting variables beforehand
> Stata is unable to open the entire dataset. The first column of the data
> contains samplingweights. The dataprovider ships a pdf with the
> for the marginal distributions of the variables in the population so I
> the true values.
> Now here lies the rub: when I weight -summarize- with analytic weights,
> approximately correct mean and standard deviation pop out. When I let
> estimate the mean with the -mean- command, with analytic weights attached
> the same fashion, I get widely differing results for the point estimate of
> the mean, far from the true values. In SPSS, I simply go to -weight cases-
> and everything comes out correct.
> Do I have to -svyset- the data? When I try to -frequency weight- the data,
> Stata complains that non-integers are not allowed while SPSS seems to not
> quarrel with them. Why is it that SPSS needs one command at the beginning
> the session while Stata has a (differing) tab dedicated to weighting for
> every single command?
*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2015 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index