Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: Weights


From   "Martin Weiss" <martin.weiss@uni-tuebingen.de>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: Weights
Date   Wed, 30 Apr 2008 17:41:47 +0200

Austin,

if only I could open the file and compress it... I have the latest gear in
terms of hard- and software (MP/2 10.0 64 bit, 4GB RAM, Vista Business 64
bit, ...) but it is next to impossible to open the 5.5 GB file. Virtual mem
makes things so slow it takes all the fun out of it... So I am stuck in a
bit of a quandary.

Martin Weiss
_________________________________________________________________

Diplom-Kaufmann Martin Weiss
Mohlstrasse 36
Room 415
72074 Tuebingen
Germany

Fon: 0049-7071-2978184

Home: http://www.wiwi.uni-tuebingen.de/cms/index.php?id=1130

Publications: http://www.wiwi.uni-tuebingen.de/cms/index.php?id=1131

SSRN: http://papers.ssrn.com/sol3/cf_dev/AbsByAuth.cfm?per_id=669945


-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Austin Nichols
Sent: Wednesday, April 30, 2008 5:27 PM
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: Weights

Martin Weiss <martin.weiss@uni-tuebingen.de>

SPSS is using the wrong type of weight, and therefore will give you
incorrect standard errors.  See -help weights- and -help svy- and the
manuals for more.

Perhaps the large size of the Stata file is due to all variables being
stored as doubles?  Try -compress- on an extract and see -help
datatypes-.

Note that -mean- restricts to obs where all vars are nonmissing, so
instead of e.g.

ds, has(type numeric)
loc num `r(varlist)'
mean `num'

try

ds, has(type numeric)
loc num `r(varlist)'
foreach v of loc num {
 mean `v'
}

or just use -summarize- with aweights or pweights
(pweights=aweights+_robust so point estimates are identical, but
variance estimates differ).

On Wed, Apr 30, 2008 at 10:57 AM, Martin Weiss
<martin.weiss@uni-tuebingen.de> wrote:
> Dear Statalisters,
>
> can anybody give me a clue as to the array of weighting options in Stata?
I
> have an important project where I would really like to make headway...
>
> My dataset features a size of 2.4 GB as .csv. When I translate this into
> SPSS, it ends up with 2.7 GB while the equivalent Stata dataset has 5.5 GB
> (!). Anyway, I usually pick out the interesting variables beforehand
because
> Stata is unable to open the entire dataset. The first column of the data
> contains samplingweights. The dataprovider ships a pdf with the
descriptives
> for the marginal distributions of the variables in the population so I
know
> the true values.
>
> Now here lies the rub: when I weight -summarize- with analytic weights,
the
> approximately correct mean and standard deviation pop out. When I let
Stata
> estimate the mean with the -mean- command, with analytic weights attached
in
> the same fashion, I get widely differing results for the point estimate of
> the mean, far from the true values. In SPSS, I simply go to -weight cases-
> and everything comes out correct.
>
> Do I have to -svyset- the data? When I try to -frequency weight- the data,
> Stata complains that non-integers are not allowed while SPSS seems to not
> quarrel with them. Why is it that SPSS needs one command at the beginning
of
> the session while Stata has a (differing) tab dedicated to weighting for
> every single command?
>
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index