Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# st: Dropping observations so sample is proportionate to population

 From Michael Goodwin To statalist@hsphsun2.harvard.edu Subject st: Dropping observations so sample is proportionate to population Date Wed, 7 Sep 2011 12:38:07 -0500

```Hi,

I will be working with a new sample dataset and I would like to drop
observations in this new dataset so that the proportions of a
particulary dummy (in this case "type) are roughly equal to those
present in the population dataset. The goal of this exercise is to
have the distribution of "types" be as similar as possible to the
population dataset.

The population has the following proportions of type:

*********************************************************************
Proportion estimation               Number of obs    =     353

--------------------------------------------------------------
| Proportion   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
type         |
1 |    .082153   .0146361      .0533678    .1109382
2 |   .0509915    .011725      .0279316    .0740514
3 |   .1104816    .016709      .0776195    .1433437
4 |   .0764873   .0141659      .0486268    .1043477
5 |   .1586402   .0194727      .1203428    .1969377
6 |   .1643059   .0197505       .125462    .2031499
7 |   .1529745   .0191861      .1152407    .1907083
8 |    .203966    .021477      .1617267    .2462054
--------------------------------------------------------------
*********************************************************************

I am not particularly experience with weighting (nor am I even sure
that this is where I would want to begin). It's possible that this
will end up being somewhat complex, given that I would want to
minimize the number of observations being dropped. Moreover, as a
given observation is dropped, the proportions of each type present in
the sample dataset change with the decrease in the denominator.

The command I'm conceptualizing would require Stata to recognize the
desired proportions for each of the 8 types, and drop observations
until those proportions have been more or less achieved in the sample
dataset. In a mixture of Stata command and plain English:

*********************************************************************
drop in 1-n if r(sample proportion of type) > r(population proportion type)

*********************************************************************

The only other way I can think of doing this is to look at the data,
and manually drop observations until the desired proportions are
achieved. That code would look something like this:

*********************************************************************
bysort type: gen tempCount=_n;
gen tempPercent=tempCount/_N;

drop if type==1 & tempPercent>.0822;
replace tempPercent=tempCount/_N;
drop if type==2 & tempPercent>.0510;
replace tempPercent=tempCount/_N;
drop if type==3 & tempPercent>.1105
replace tempPercent=tempCount/_N;
drop if type==4 & tempPercent>.0765;
replace tempPercent=tempCount/_N;
drop if type==5 & tempPercent>.1586;
replace tempPercent=tempCount/_N;
drop if type==6 & tempPercent>.1643;
replace tempPercent=tempCount/_N;
drop if type==7 & tempPercent>.1530;
replace tempPercent=tempCount/_N;
drop if type==8 & tempPercent>.2040;
*********************************************************************

Any advice would be most appreciated.

Thanks,

Mike

--
Mike Goodwin
Innovations for Poverty Action
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```