Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: using egen, total() with weights


From   Steve Samuels <sjsamuels@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: RE: using egen, total() with weights
Date   Thu, 9 Feb 2012 20:47:04 -0500

I apologize to Sheera.  But, I think that in this situation, she should be using the -svy- commands.

Steve



On Feb 9, 2012, at 8:27 PM, Nick Cox wrote:

It was me that said "I don't do -svy-" meaning not that I do not
believe in it but that I do not practise it.

Nick

On Fri, Feb 10, 2012 at 12:48 AM, Steve Samuels <sjsamuels@gmail.com> wrote:
> 
> Sheera Joy Olasky:
> 
> Because you are ignoring the survey design, you will be unable to provide estimates of uncertainty, e.g. confidence intervals. So though you report that you "don't do -svy-", you should.
> 
> What is the connection of the list of crimes to the original survey? If the crimes were reported by the people in the original survey, then it is doubly important that you "do -svy-". Otherwise you are effectively treating the list as a simple random sample. But the reported crimes inherit the study weights of the people who reported them. Thus crimes of the same type reported by different people should receive different weights. The external inflation factors should applied to those weights, a process called post-stratification. If this isn't done, the estimated totals will be biased. .
> 
> 
> Steve
> sjsamuels@gmail.com
> 
> 
> 
> On Feb 9, 2012, at 6:47 PM, Nick Cox wrote:
> 
> Thanks for the extra detail. This is survey data but not it seems -svy- data, so belay that advice. Now sounds most like a problem for -collapse- to me.
> 
> Nick
> n.j.cox@durham.ac.uk
> 
> Sheera Joy Olasky
> 
> Thanks for this insight. I think that I may not have stated the case
> correctly, which I know is not particularly helpful on a listserve.
> 
> I have a second data set of criminal events. Each entry corresponds to
> one crime, and it is given a weight to account for the fact that many
> crimes are not reported. Each incident is weighted up, so the annual
> state total of crimes will correspond with other estimates. I would
> like to use the individual crimes to create a count for each
> state-year.
> 
> It was here that I had the brilliant idea to try egen total. Still wrong?
> 
> Many thanks.
> 
> On Thu, Feb 9, 2012 at 6:00 PM, Nick Cox <n.j.cox@durham.ac.uk> wrote:
>> I am a fan of -egen- when it's the right tool but I wouldn't start there at all.
>> 
>> As you imply, -egen- can lose precision if you use the default variable type of -float-; the remedy is not to do that, but that's not the crux here.
>> 
>> -total- offers direct support for pweights. I don't do -svy- but it sounds exactly the right place to start.
>> 
>> Frankly, from your report you are getting some rather strange advice.
>> 
>> Nick
>> n.j.cox@durham.ac.uk
>> 
>> Sheera Joy Olasky
>> 
>> I have a set of individual level survey data, which includes
>> person-weights. I would like to create population totals by year and
>> state. I am using Stata 11.2.
>> 
>> Originally I had thought to use bysort id: egen pop=total(weight)
>> where id is the state-year.
>> 
>> However, it was then suggested to me that I should be using sum
>> [aweight=weight]. This seems more complicated to me, since I'm not
>> sure how/if I could make new variables with the sum output in the same
>> way that I get a new variable with egen total (weights). Use of
>> scalars was recommended, but I have no experience with them.
>> 
>> Initially, when I compared the values I got with egen total(weight)
>> and sum [aweight=weight], they were very close--maybe off by about 4
>> people out of over 80,000,000. This imprecision is okay in this
>> scenario, but it got me concerned. I thought that perhaps there was
>> too much rounding happening with egen, so I generated the
>> total(weight) as double. The increased precision seems to have helped,
>> and now egen total(weight) and sum [aweight=weight] appear to give me
>> the same results when I spot check.
>> 
>> I don't feel completely confident, though. Before I go ahead and use
>> egen, I'd like to know if this is okay or ill-advised. I'd be curious
>> to know others' preferred way of handling this.
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index