Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: using egen, total() with weights

From   Steve Samuels <>
Subject   Re: st: RE: using egen, total() with weights
Date   Thu, 9 Feb 2012 19:48:52 -0500

Sheera Joy Olasky:

Because you are ignoring the survey design, you will be unable to provide estimates of uncertainty, e.g. confidence intervals. So though you report that you "don't do -svy-", you should.

What is the connection of the list of crimes to the original survey? If the crimes were reported by the people in the original survey, then it is doubly important that you "do -svy-". Otherwise you are effectively treating the list as a simple random sample. But the reported crimes inherit the study weights of the people who reported them. Thus crimes of the same type reported by different people should receive different weights. The external inflation factors should applied to those weights, a process called post-stratification. If this isn't done, the estimated totals will be biased. .



On Feb 9, 2012, at 6:47 PM, Nick Cox wrote:

Thanks for the extra detail. This is survey data but not it seems -svy- data, so belay that advice. Now sounds most like a problem for -collapse- to me. 


Sheera Joy Olasky

Thanks for this insight. I think that I may not have stated the case
correctly, which I know is not particularly helpful on a listserve.

I have a second data set of criminal events. Each entry corresponds to
one crime, and it is given a weight to account for the fact that many
crimes are not reported. Each incident is weighted up, so the annual
state total of crimes will correspond with other estimates. I would
like to use the individual crimes to create a count for each

It was here that I had the brilliant idea to try egen total. Still wrong?

Many thanks.

On Thu, Feb 9, 2012 at 6:00 PM, Nick Cox <> wrote:
> I am a fan of -egen- when it's the right tool but I wouldn't start there at all.
> As you imply, -egen- can lose precision if you use the default variable type of -float-; the remedy is not to do that, but that's not the crux here.
> -total- offers direct support for pweights. I don't do -svy- but it sounds exactly the right place to start.
> Frankly, from your report you are getting some rather strange advice.
> Nick
> Sheera Joy Olasky
> I have a set of individual level survey data, which includes
> person-weights. I would like to create population totals by year and
> state. I am using Stata 11.2.
> Originally I had thought to use bysort id: egen pop=total(weight)
> where id is the state-year.
> However, it was then suggested to me that I should be using sum
> [aweight=weight]. This seems more complicated to me, since I'm not
> sure how/if I could make new variables with the sum output in the same
> way that I get a new variable with egen total (weights). Use of
> scalars was recommended, but I have no experience with them.
> Initially, when I compared the values I got with egen total(weight)
> and sum [aweight=weight], they were very close--maybe off by about 4
> people out of over 80,000,000. This imprecision is okay in this
> scenario, but it got me concerned. I thought that perhaps there was
> too much rounding happening with egen, so I generated the
> total(weight) as double. The increased precision seems to have helped,
> and now egen total(weight) and sum [aweight=weight] appear to give me
> the same results when I spot check.
> I don't feel completely confident, though. Before I go ahead and use
> egen, I'd like to know if this is okay or ill-advised. I'd be curious
> to know others' preferred way of handling this.

*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index