Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Select sample


From   jpitblado@stata.com (Jeff Pitblado, Stata Corp.)
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Select sample
Date   Thu, 12 Sep 2002 16:47:13 -0500

Augusto Hoszowski <ahosz@indec.mecon.gov.ar> wrote:

> I need to select one sample stratified. My  file contains the id_strata, the
> size of the sample in the strata and size_sample. What is wrong with  the
> syntax ?
> 
> program define seleccio
>    local tamanio size_sample[`1']
>    by id_strata: sample `tamanio', count
> end
> seleccio 1
> 
> Sincerely yours,

The -sample- command requires an explicit number to be specified, thus the
error message:

	. by id : sample sample_size[1], count
	'size' found where number expected
	r(7);

If the variable sample_size contains the same value for all observasions, then
use

	. local n = sample_size[1]
	. by id : sample `n', count

If sample_size is constant within id_strata, but different for differenve
values of id_strata, then some programming is required.  In the following
example: I generate some data, define my sample selector program, run it on
the data, then summarize the results using -tabulate- and -summarize-.  The
-sel1- program requires 3 arguments:

arg 1: id     -- a strata id variable

arg 2: size   -- a sample size variable -- containing sample sizes for each
stratum.  Note that I do not check to make sure that this variable is constant
within the id variable, I'll leave that as a exercise. :)

arg 3: tokeep -- a valid name to use to generate a variable that indicates
which observations were selected for the sample.  Note that I just -drop- this
variable if it exists, then generate my own at the end.

***** BEGIN mysam.do

cap log close
* generate some data
clear
local obs 100
set obs `obs'
set seed 92507
* the strata id variable
gen id = int(5*uniform()) + 1
sort id
* the size for each stratum
gen size = .
by id : replace size = cond(_n==1,int(_N*(1 + uniform())/2), size[_n-1] )
* some measurement
by id : gen y = id*( 1 + invnorm(uniform()) )

* my program that indicates the sampled observations
cap program drop sel1
program define sel1
	args id size tokeep
	/* id     : group id
	 * size   : sample size (NOTE:  assumed contant within -id-)
	 * tokeep : name of var to indicate sampled obs
	 */
	confirm var `id'
	confirm numeric var `size'
	confirm name `tokeep'
	/* replace with my own sample indicator var */
	cap drop `tokeep'
	/* randomly order the obs */
	tempvar r
	gen `r' = uniform()
	sort `id' `r'
	/* generate sample indicator */
	by `id' : gen `tokeep' = _n<=`size'
end

qui log using mysam.log, replace
sel1 id size kept
tab id, sum(size) mean obs
tab id if kept, sum(size) mean obs
sum y
sum y if kept
qui log close

***** END mysam.do

Here is the log produced by the above -do- file.

***** BEGIN mysam.log

. sel1 id size kept

. tab id, sum(size) mean obs

            |     Summary of size
         id |        Mean        Obs.
------------+------------------------
          1 |          14          20
          2 |          16          19
          3 |          14          16
          4 |          12          22
          5 |          21          23
------------+------------------------
      Total |       15.55         100

. tab id if kept, sum(size) mean obs

            |     Summary of size
         id |        Mean        Obs.
------------+------------------------
          1 |          14          14
          2 |          16          16
          3 |          14          14
          4 |          12          12
          5 |          21          21
------------+------------------------
      Total |   16.012987          77

. sum y

    Variable |     Obs        Mean   Std. Dev.       Min        Max
-------------+-----------------------------------------------------
           y |     100    3.421033   3.903548  -2.579357   16.88143

. sum y if kept

    Variable |     Obs        Mean   Std. Dev.       Min        Max
-------------+-----------------------------------------------------
           y |      77    3.598924   4.157545  -2.140999   16.88143

. qui log close

***** END mysam.log

--Jeff 
  jpitblado@stata.com

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index