Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Michael Goodwin <mgoodwin@poverty-action.org> |
To | statalist@hsphsun2.harvard.edu |
Subject | st: Dropping observations so sample is proportionate to population |
Date | Wed, 7 Sep 2011 12:38:07 -0500 |
Hi, I will be working with a new sample dataset and I would like to drop observations in this new dataset so that the proportions of a particulary dummy (in this case "type) are roughly equal to those present in the population dataset. The goal of this exercise is to have the distribution of "types" be as similar as possible to the population dataset. The population has the following proportions of type: ********************************************************************* Proportion estimation Number of obs = 353 -------------------------------------------------------------- | Proportion Std. Err. [95% Conf. Interval] -------------+------------------------------------------------ type | 1 | .082153 .0146361 .0533678 .1109382 2 | .0509915 .011725 .0279316 .0740514 3 | .1104816 .016709 .0776195 .1433437 4 | .0764873 .0141659 .0486268 .1043477 5 | .1586402 .0194727 .1203428 .1969377 6 | .1643059 .0197505 .125462 .2031499 7 | .1529745 .0191861 .1152407 .1907083 8 | .203966 .021477 .1617267 .2462054 -------------------------------------------------------------- ********************************************************************* I am not particularly experience with weighting (nor am I even sure that this is where I would want to begin). It's possible that this will end up being somewhat complex, given that I would want to minimize the number of observations being dropped. Moreover, as a given observation is dropped, the proportions of each type present in the sample dataset change with the decrease in the denominator. The command I'm conceptualizing would require Stata to recognize the desired proportions for each of the 8 types, and drop observations until those proportions have been more or less achieved in the sample dataset. In a mixture of Stata command and plain English: ********************************************************************* drop in 1-n if r(sample proportion of type) > r(population proportion type) ********************************************************************* The only other way I can think of doing this is to look at the data, and manually drop observations until the desired proportions are achieved. That code would look something like this: ********************************************************************* bysort type: gen tempCount=_n; gen tempPercent=tempCount/_N; drop if type==1 & tempPercent>.0822; replace tempPercent=tempCount/_N; drop if type==2 & tempPercent>.0510; replace tempPercent=tempCount/_N; drop if type==3 & tempPercent>.1105 replace tempPercent=tempCount/_N; drop if type==4 & tempPercent>.0765; replace tempPercent=tempCount/_N; drop if type==5 & tempPercent>.1586; replace tempPercent=tempCount/_N; drop if type==6 & tempPercent>.1643; replace tempPercent=tempCount/_N; drop if type==7 & tempPercent>.1530; replace tempPercent=tempCount/_N; drop if type==8 & tempPercent>.2040; ********************************************************************* Any advice would be most appreciated. Thanks, Mike -- Mike Goodwin Innovations for Poverty Action * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/