Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: general simulation program (postfile)


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: general simulation program (postfile)
Date   Fri, 15 Aug 2003 18:01:26 +0100

Benoit Dulong
 
> The following simulation program will be the first Stata program
> presented to my director at University of Montreal.
> It works, but it is slow, and it feels bizarre...
> (I have to drop _all two times)
> 
> Question-1
> In general, is that how a simulation should be done with Stata
> (postfile, post, postclose) ?

There are lots of ways you could simulate. You need not 
use -postfile-. Personally, I always try to do stuff in place
whenever possible without masses of file I/0. 
 
> Question-2
> To generate a random number n1, I use two syntax lines:
> --------------------
> rndpoi 1 `lambda1'
> local n1 = xp
> --------------------
> Is it possible to do this in only one syntax line ?
> (pass directly the random number to local n1)

Not with -rndpoi-. But then -rndpoi- enforces its way 
of seeing things upon you. It does not offer the flexibility 
you need here. Avoid it and use -genpoisson-. 
 
> Question-3
> It it possible not to drop _all two times ?
> (use subroutine ?)

Yes, you can avoid -drop-. 

> ------------------------------------------------------------
> ----------
> MY ALGORITHM
> ------------------------------------------------------------
> ----------
> 
> [1]
> generate random number n1
> from a Poisson distribution with parameter lambda1
> 
> generate random number n2
> from a Poisson distribution with parameter lambda2
> 
> n = n1+n1
> 
> [2]
> generate random vector x (n component)
> from uniform distribution (generate x=uniform())
> 
> generate random vector y (n component)
> from uniform distribution (generate y=uniform())
> 
> [3]
> for each (x[i],y[i]), i=1...n,
> calculate the distance to the nearest neighbor
> using -- nearest -- from N.J COX (NJC 1.1.0 10 January 2003)
> the syntax
> nearest x y, id(id) dist(h)
> gives a file with n rows and 4 columns (x y id h)
> 
> [4]
> from the file obtained in [3]
> calculate the mean (mh) and var (vh) of the n values of h
> 
> [5]
> repeat [1] to [4] 1000 times
> 
> ------------------------------------------------------------
> ----------
> MY PROGRAM
> ------------------------------------------------------------
> ----------
> 
> program define myprog1
> version 7.0
> args lambda1 lambda2
> 
> postfile mysim1 n1 n2  n mh vh using "myresults1", replace
> 
> forvalues i=1(1)1000{
> 
> drop _all
> 
> display "i= " `i'
> 
> rndpoi 1 `lambda1'
> local n1 = xp
> 
> rndpoi 1 `lambda2'
> local n2 = xp
> 
> local n = `n1' + `n2'
> 
> drop _all
> set obs `n'
> 
> gen x = uniform()
> gen y = uniform()
> nearest x y, id(id) dist(h)
> 
> summarize h
> local mh = r(mean)
> local vh = r(Var)
> 
> post mysim1 (`n1') (`n2') (`n') (`mh') (`vh')
> 
> }
> 
> postclose mysim1
> 
> end
> 

Various thoughts: 

1. You are using -post-. Not essential. 

2. You are invoking a program -rndpoi- to 
generate 2000 randoms from a Poisson 2000 
times. You can call -genpoisson- twice. 

3. 2000 -drop _all- can be avoided. 

4. -display- serves no purpose beyond showing 
progress of program. Once it's working, you 
can cut that. 

5. Statements like 

local n1 = xp

hinge on the accident that you have just 
one observation in memory. Dangerous style. 

6. Invoking -nearest- 1000 times imposes
an overhead of interpretation. Also you 
are forcing it to generate identifiers
which you never use. Read the code into 
your program and cut all surplus. 

Here the result of a bit of hacking around. 

program define myprog2
	version 7.0
	args lambda1 lambda2

	clear 
	set obs 1000 

        genpoisson n1, mu(`lambda1') 
        genpoisson n2, mu(`lambda2') 
	gen n = n1 + n2 

	qui { 
		gen which = _n 
		expand n 
		gen x = uniform() 
		gen y = uniform()
		gen dist = . 
		gen mh = . 
		gen vh = . 
		sort which
		gen long obs = _n 
	
        	forval i = 1/1000 {
			noi su obs if which == `i', meanonly 
			local imin = r(min)
			local imax = r(max) 

			forval j = `imin'/`imax' { 
				scalar mind = . 
				forval k = `imin'/`imax' { 
					if (`j' != `k') { 
						scalar d = /* 
			*/ (x[`j'] - x[`k'])^2 + (y[`j'] - y[`k'])^2 
						scalar mind = /* 
			*/ min(scalar(d), mind)
					}
				}	
				replace dist = sqrt(mind) in `j' 
			}
				
			summarize dist in `imin'/`imax', detail  
			replace mh = r(mean) in `imin'    
			replace vh = r(Var) in `imin'     
		}

		bysort which (mh) : keep if _n == 1
		drop obs x y dist 
	}	
end

Comments on that: 

get all the Poissons at once; less fishing time 

what if n is 0? I think it goes OK 

get your randoms on unit square at once 

do things -in- rather than -if- 
wherever possible: by Blasnik's First Law, that's
a lot faster

as Michael points out in his email which 
I've just seen, the -nearest- algorithm 
is very crude.

Nick
n.j.cox@durham.ac.uk 
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index