Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: i-1 in forvalues loop


From   Steven Samuels <[email protected]>
To   [email protected]
Subject   Re: st: RE: i-1 in forvalues loop
Date   Wed, 19 Oct 2011 08:27:38 -0400


You are using a fractional interval in your example because 15 does not divide evenly into the total number of students (which must be 15x 3166.467 = 47,497)  Leslie Kish (Survey Sampling, 1965, Wiley, page 115) presents better solutions to this problem. A simple one is to enlarge the total to the next highest multiple of 15, or 3167 x 16 = 47,505. This adds eight fictitious students, an increase of less than 0.02%, and makes the sampling interval 3167. You could, if you wish, disperse the extra students to the schools with the largest size; this will alter the selection probabilities for individual schools only slightly. 

I think it a waste of effort to make the intervals too exact, because the advance counts are likely to be out of date when you do the study. See WE Deming, Sample Design in Business Research, Wiley, 1960, Chapter Six, and elsewhere, for examples of rough counts. If you will be sub-sampling students, then you will need Deming's methods to keep the probabilities of selection constant across schools in the same stratum. 

A  specific mistake in your code: the starting value is not random.  Taking the mean of the 47,000+ uniforms has made it a constant (very close to 3166.667). Assuming that you take the advice above, the command should be:

. gen bal1= ceil(3167*runiform())

But I agree with Nick thoughts about style. I would use scalars, not variables, to hold the "ball" choices.

Steve

On Oct 18, 2011, at 10:52 AM, Viktor Emonds wrote:

Let me try to explain again: I have a file with schools and I know the number of students in the target population. I have everything neatly sorted and for sampling, I just need to give each school a chance poportional to the number of students in the target year in that school on getting selected, determinate a random starting point to make my first pick and add a constant interval. 

In the loop with bal, I basically try to draw the 'winning numbers' by taking the constant (bal1) and adding the interval (3166.467), storing the new constant in bal2, bal3 ....bal15. The way I envisioned doing this was by running a loop for bal2-bal15, adding the interval to the value of the previous bal (the i-1th bal). Is there a way to do so?

Best,

Viktor

______________________________
I don't understand what you are trying to do. I comment only on obvious Stata problems. 
`i-1' would only work if "i-1" were the name of a local macro, but it can't be such a name, as minus signs are not allowed in Stata names. 
gen bal`i' = bal`=`i'-1' * 3166.467
would at first sight work as then Stata knows to evaluate the expression 
`i' - 1 
on the fly. 
Your code largely consists of putting constants into variables, which is legal but not especially good style. 
Note that 
gen lotto=sum(studentsj3)                                                  //sum of target population
produces a _cumulative_ sum: only in the last observation will this be the actual sum, as your comment implies. Whether the comment or the code is what you want only you can say. 
Nick

________________________________________
Van: Viktor Emonds
Verzonden: dinsdag 18 oktober 2011 15:55
Aan: [email protected]
Onderwerp: i-1 in forvalues loop

Hey,

I am trying to draw a sample with random start, fixed interval systematic sampling procedure in each explicit stratum. The data in each stratum are already sorted by all the implicit stratifiers with serpentine sorting for a variable of particular interest. Now I just need to do the actual sampling and tried to start by doing the following:

use ethnicframe31                                                              //the specific explicit stratum
gen lotto=sum(studentsj3)                                                  //sum of target population
egen bal1= mean(3166.4667*runiform())                             //random starting point
forvalues i=2/15 {                                                          //Draw ' lotto balls' by adding the fixed interval
gen bal`i'= bal`i-1'*3166.4667
}

gen winnaar=0                                                                 //Identify ' winning'  schools
forvalues i=1/15{
replace winnaar=1 if inrange(bal`i',lotto[_n-1],lotto)
}

Apparently, the `i-1'  in the first loop is not understood. What am I doing wrong here? Thanks in advance!
Best,
Viktor
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index