"Mustillo, Sarah A" <smustill@purdue.edu>

<statalist@hsphsun2.harvard.edu>

st: RE: Weights

Wed, 30 Apr 2008 11:52:42 -0400

Martin - It sounds like you have probability weights, in which case you may want to svyset your data. Some commands will take pweights without svysetting your data (like -regress- and -mean-), but others won't (like -tabulate-). Try -either -mean x [pw=wtvar]- or -svy:mean x- and see if your estimate is closer to what you were expecting. Remember that Stata allows you specify 3 different kinds of weights - analytic weights, sampling weights, and frequency weights, which is a strength rather than a weakness. Some programs treat a probability weight and a frequency weight as the same. As for why you got the error message when you tried to use frequency weights, they do indeed need to be a whole number, as they represent the number of duplicated observations. Sarah Sarah A. Mustillo, Ph.D Associate Professor of Sociology Faculty Associate, Center on Aging and the Life Course Purdue University 700 W. State St. West Lafayette IN 47907-2059 765-496-2226 -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Martin Weiss Sent: Wednesday, April 30, 2008 10:58 AM To: statalist@hsphsun2.harvard.edu Subject: st: Weights Dear Statalisters, can anybody give me a clue as to the array of weighting options in Stata? I have an important project where I would really like to make headway... My dataset features a size of 2.4 GB as .csv. When I translate this into SPSS, it ends up with 2.7 GB while the equivalent Stata dataset has 5.5 GB (!). Anyway, I usually pick out the interesting variables beforehand because Stata is unable to open the entire dataset. The first column of the data contains samplingweights. The dataprovider ships a pdf with the descriptives for the marginal distributions of the variables in the population so I know the true values. Now here lies the rub: when I weight -summarize- with analytic weights, the approximately correct mean and standard deviation pop out. When I let Stata estimate the mean with the -mean- command, with analytic weights attached in the same fashion, I get widely differing results for the point estimate of the mean, far from the true values. In SPSS, I simply go to -weight cases- and everything comes out correct. Do I have to -svyset- the data? When I try to -frequency weight- the data, Stata complains that non-integers are not allowed while SPSS seems to not quarrel with them. Why is it that SPSS needs one command at the beginning of the session while Stata has a (differing) tab dedicated to weighting for every single command? Martin Weiss * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

