Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Weights


From   "Mustillo, Sarah A" <smustill@purdue.edu>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: Weights
Date   Wed, 30 Apr 2008 11:52:42 -0400

Martin - 

It sounds like you have probability weights, in which case you may want
to svyset your data.  Some commands will take pweights without
svysetting your data (like -regress- and -mean-), but others won't (like
-tabulate-).  Try -either -mean x [pw=wtvar]- or -svy:mean x- and see if
your estimate is closer to what you were expecting.  Remember that Stata
allows you specify 3 different kinds of weights - analytic weights,
sampling weights, and frequency weights, which is a strength rather than
a weakness.  Some programs treat a probability weight and a frequency
weight as the same.  

As for why you got the error message when you tried to use frequency
weights, they do indeed need to be a whole number, as they represent the
number of duplicated observations.  

Sarah 

Sarah A. Mustillo, Ph.D 
Associate Professor of Sociology
Faculty Associate, Center on Aging and the Life Course
Purdue University
700 W. State St.
West Lafayette IN 47907-2059

765-496-2226

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Martin Weiss
Sent: Wednesday, April 30, 2008 10:58 AM
To: statalist@hsphsun2.harvard.edu
Subject: st: Weights

Dear Statalisters,

can anybody give me a clue as to the array of weighting options in
Stata? I
have an important project where I would really like to make headway...

My dataset features a size of 2.4 GB as .csv. When I translate this into
SPSS, it ends up with 2.7 GB while the equivalent Stata dataset has 5.5
GB
(!). Anyway, I usually pick out the interesting variables beforehand
because
Stata is unable to open the entire dataset. The first column of the data
contains samplingweights. The dataprovider ships a pdf with the
descriptives
for the marginal distributions of the variables in the population so I
know
the true values. 

Now here lies the rub: when I weight -summarize- with analytic weights,
the
approximately correct mean and standard deviation pop out. When I let
Stata
estimate the mean with the -mean- command, with analytic weights
attached in
the same fashion, I get widely differing results for the point estimate
of
the mean, far from the true values. In SPSS, I simply go to -weight
cases-
and everything comes out correct.

Do I have to -svyset- the data? When I try to -frequency weight- the
data,
Stata complains that non-integers are not allowed while SPSS seems to
not
quarrel with them. Why is it that SPSS needs one command at the
beginning of
the session while Stata has a (differing) tab dedicated to weighting for
every single command?


Martin Weiss



*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index