Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Dropping observations so sample is proportionate to population

From	Michael Goodwin <[email protected]>
To	[email protected]
Subject	st: Dropping observations so sample is proportionate to population
Date	Wed, 7 Sep 2011 12:38:07 -0500

Hi,

I will be working with a new sample dataset and I would like to drop
observations in this new dataset so that the proportions of a
particulary dummy (in this case "type) are roughly equal to those
present in the population dataset. The goal of this exercise is to
have the distribution of "types" be as similar as possible to the
population dataset.

The population has the following proportions of type:

*********************************************************************
Proportion estimation               Number of obs    =     353

--------------------------------------------------------------
             | Proportion   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
type         |
           1 |    .082153   .0146361      .0533678    .1109382
           2 |   .0509915    .011725      .0279316    .0740514
           3 |   .1104816    .016709      .0776195    .1433437
           4 |   .0764873   .0141659      .0486268    .1043477
           5 |   .1586402   .0194727      .1203428    .1969377
           6 |   .1643059   .0197505       .125462    .2031499
           7 |   .1529745   .0191861      .1152407    .1907083
           8 |    .203966    .021477      .1617267    .2462054
--------------------------------------------------------------
*********************************************************************

I am not particularly experience with weighting (nor am I even sure
that this is where I would want to begin). It's possible that this
will end up being somewhat complex, given that I would want to
minimize the number of observations being dropped. Moreover, as a
given observation is dropped, the proportions of each type present in
the sample dataset change with the decrease in the denominator.

The command I'm conceptualizing would require Stata to recognize the
desired proportions for each of the 8 types, and drop observations
until those proportions have been more or less achieved in the sample
dataset. In a mixture of Stata command and plain English:

*********************************************************************
drop in 1-n if r(sample proportion of type) > r(population proportion type)

*********************************************************************

The only other way I can think of doing this is to look at the data,
and manually drop observations until the desired proportions are
achieved. That code would look something like this:

*********************************************************************
bysort type: gen tempCount=_n;
gen tempPercent=tempCount/_N;

drop if type==1 & tempPercent>.0822;
replace tempPercent=tempCount/_N;
drop if type==2 & tempPercent>.0510;
replace tempPercent=tempCount/_N;
drop if type==3 & tempPercent>.1105
replace tempPercent=tempCount/_N;
drop if type==4 & tempPercent>.0765;
replace tempPercent=tempCount/_N;
drop if type==5 & tempPercent>.1586;
replace tempPercent=tempCount/_N;
drop if type==6 & tempPercent>.1643;
replace tempPercent=tempCount/_N;
drop if type==7 & tempPercent>.1530;
replace tempPercent=tempCount/_N;	
drop if type==8 & tempPercent>.2040;
*********************************************************************

Any advice would be most appreciated.

Thanks,

Mike

--
Mike Goodwin
Innovations for Poverty Action
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: Dropping observations so sample is proportionate to population
  - From: Maarten Buis <[email protected]>

Prev by Date: Re: st: Testing for serial correlation in small panel samples
Next by Date: Re: st: Re: Listing user-written ado files called by program
Previous by thread: [no subject]
Next by thread: Re: st: Dropping observations so sample is proportionate to population
Index(es):
- Date
- Thread