Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: RE: using egen, total() with weights


From   Nick Cox <n.j.cox@durham.ac.uk>
To   "'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu>
Subject   RE: st: RE: using egen, total() with weights
Date   Thu, 9 Feb 2012 23:47:48 +0000

Thanks for the extra detail. This is survey data but not it seems -svy- data, so belay that advice. Now sounds most like a problem for -collapse- to me. 

Nick 
n.j.cox@durham.ac.uk 

Sheera Joy Olasky

Thanks for this insight. I think that I may not have stated the case
correctly, which I know is not particularly helpful on a listserve.

I have a second data set of criminal events. Each entry corresponds to
one crime, and it is given a weight to account for the fact that many
crimes are not reported. Each incident is weighted up, so the annual
state total of crimes will correspond with other estimates. I would
like to use the individual crimes to create a count for each
state-year.

It was here that I had the brilliant idea to try egen total. Still wrong?

Many thanks.

On Thu, Feb 9, 2012 at 6:00 PM, Nick Cox <n.j.cox@durham.ac.uk> wrote:
> I am a fan of -egen- when it's the right tool but I wouldn't start there at all.
>
> As you imply, -egen- can lose precision if you use the default variable type of -float-; the remedy is not to do that, but that's not the crux here.
>
> -total- offers direct support for pweights. I don't do -svy- but it sounds exactly the right place to start.
>
> Frankly, from your report you are getting some rather strange advice.
>
> Nick
> n.j.cox@durham.ac.uk
>
> Sheera Joy Olasky
>
> I have a set of individual level survey data, which includes
> person-weights. I would like to create population totals by year and
> state. I am using Stata 11.2.
>
> Originally I had thought to use bysort id: egen pop=total(weight)
> where id is the state-year.
>
> However, it was then suggested to me that I should be using sum
> [aweight=weight]. This seems more complicated to me, since I'm not
> sure how/if I could make new variables with the sum output in the same
> way that I get a new variable with egen total (weights). Use of
> scalars was recommended, but I have no experience with them.
>
> Initially, when I compared the values I got with egen total(weight)
> and sum [aweight=weight], they were very close--maybe off by about 4
> people out of over 80,000,000. This imprecision is okay in this
> scenario, but it got me concerned. I thought that perhaps there was
> too much rounding happening with egen, so I generated the
> total(weight) as double. The increased precision seems to have helped,
> and now egen total(weight) and sum [aweight=weight] appear to give me
> the same results when I spot check.
>
> I don't feel completely confident, though. Before I go ahead and use
> egen, I'd like to know if this is okay or ill-advised. I'd be curious
> to know others' preferred way of handling this.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index