[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: R: AW: R: AW: R: RE: odd results after insample

From	"Carlo Lazzaro" <[email protected]>
To	<[email protected]>
Subject	st: R: AW: R: AW: R: RE: odd results after insample
Date	Sun, 27 Sep 2009 09:02:05 +0200
Dear Martin,

thanks a lot for your kind reply.

Sorry for the typo: the meaning was -invaluable-. The previous version of my
reply included -unexpected-, since your efforts were truly appreciated. In
turning -unexpected- into -invaluable- my keyboard did some jokes.

I have replicated the analysis concerning Tables 1 and 2 (both panels)
during this week.
I share your opinion that there's nothing to do with Tables 3 and 4 of the
paper, unless the data sets are available. However, set aside the examples
reported in this particular paper, I am more interested in the available
Stata procedures for drawing random sampling. It is quite usual for me to
mimic a data set for testing the features of different estimators of the
population mean.

Thanks a lot again for your helpful support and enjoy your Sunday.

Kind Regards,
Carlo 

-----Messaggio originale-----
Da: [email protected]
[mailto:[email protected]] Per conto di Martin Weiss
Inviato: sabato 26 settembre 2009 21.47
A: [email protected]
Oggetto: st: AW: R: AW: R: RE: odd results after insample


<> 

I am not sure that you can do anything about tables 3 and 4, as they are out
of reach as long as you do not hold the specific data used in these tables
in hand. 

What you can do, though, is replicate the (left panel of) table 1, which is
an interesting exercise in Stata programming and the use of -postfile- and
-tabdisp-. I am not sure whether I have done this efficiently, but it sure
works, and under a -version- other than mine at that. 

Note that I am using 100 replications for the -simulate- command, instead of
10,000 as in the original article, so that this thing finishes in finite
time. You want to check this _very_ carefully:



*************

clear*
vers 9.2

//lognormal part
capt prog drop lognorm

prog def lognorm, rclass
vers 9.2
	syntax [,obs(integer 100) cov(real 2)]
	drop _all
	set obs `obs'
	loc sd = sqrt(log(`cov'^2+1))
	loc mean = log(1000)-.5*`sd'^2
	tempvar z
	gen `z'=exp(invnormal(uniform())*`sd'+`mean')
	su `z'
	ret sca rmse=sqrt((`r(mean)'-1000)^2)
end


//gamma part
capt prog drop gam

prog def gam, rclass
vers 9.2
syntax [,obs(integer 100) cov(real 2)]
	drop _all
	set obs `obs'
	loc shape = (`cov')^(-2)
	loc scale = 1000/`shape'
	tempvar z
	gengamma `z', alpha(`shape') beta(`scale')
	su `z', mean
	ret sca rmse=sqrt((`r(mean)'-1000)^2)
end

//postfile

tempname hdle
tempfile info
postfile `hdle' str15 distr cov  /* 
*/ size meanrmse using `info'

//gamma
foreach cv in .25 .5 1 1.5 2{
		foreach size in 20 50 200 500 2000{
			simulate rmse=r(rmse), reps(100): /* 
			*/ gam, obs(`size') cov(`cv') 
			sum rmse, mean
			post `hdle' ("Gamma") (`cv') (`size') (r(mean))
		}
 }

//lognormal
foreach cv in .25 .5 1 1.5 2{
		foreach size in 20 50 200 500 2000{
			simulate rmse=r(rmse), reps(100): /* 
			*/ lognorm, obs(`size') cov(`cv') 
			sum rmse, mean
			post `hdle' ("Lognormal") (`cv') (`size') (r(mean))
		}
 }

postclose `hdle'

preserve
	use `info', clear
	replace meanrmse=round(meanrmse)
	tabdisp cov size, cellvar(meanrmse) /* 
	*/ by(distr) 
restore
*************

Do you have any idea how to replicate the right panel? Somehow these results
are out of reach for me :-(

HTH
Martin


-----Ursprüngliche Nachricht-----
Von: [email protected]
[mailto:[email protected]] Im Auftrag von Carlo Lazzaro
Gesendet: Samstag, 26. September 2009 20:56
An: [email protected]
Cc: 'Martin Weiss'
Betreff: st: R: AW: R: RE: odd results after insample

Dear Martin,
many thanks for your unvaluable efforts.
However, I am current interested in data reported in Table 3 and 4 of the
article. I have not access to the related data sets and I am figuring out a
way to mimick them and random sampling from the obtained (far fetched)
distributions.

Thanks a lot again, especially for downloading the paper and devoting your
time to think it over?

Out of curiosity: is the economic evaluation of health care programmes one
of your research fields?

Kind Regards,
Carlo

-----Messaggio originale-----
Da: [email protected]
[mailto:[email protected]] Per conto di Martin Weiss
Inviato: sabato 26 settembre 2009 19.51
A: [email protected]
Oggetto: st: AW: R: RE: odd results after insample


<> 

So if you wanted the entire figure 1 under Stata -version- 9.2, you would
probably want to install Bobby`s -findit gendist- and then:


*************
clear*
vers 9.2

//lognormal part
capt prog drop myprog

prog def myprog, rclass
vers 9.2
	syntax newvarname(numeric max=1), [obs(integer 100) cov(real 2)]
	set obs `obs'
	loc sd = sqrt(log(`cov'^2+1))
	loc mean = log(1000)-.5*`sd'^2
	gen `varlist'=exp(invnormal(uniform())*`sd'+`mean')
	qui su `varlist'
	ret sca mean=r(mean)
	ret sca cv=r(sd)/r(mean)
end


loc gra 
loc j 1

//for sample size 2000
foreach cv in 0.25 0.5 1 1.5 2{
	myprog lognorm`j', obs(2000) cov(`cv')
	loc gra `gra' (kdensity lognorm`j' if lognorm`j'<3000)
	loc ++j
} 

//see the mean and coeff of variation
tabstat _all, stat(mean cv sd)

tw `gra', legend(off) nodraw /* 
*/ name(lognormal, replace)

//gamma part
capt prog drop mynewprog

prog def mynewprog, rclass
vers 9.2
syntax newvarname(numeric max=1) [,obs(integer 100) cov(real 2)]
set obs `obs'
	loc shape = (`cov')^(-2)
	loc scale = 1000/`shape'
	gengamma `varlist', alpha(`shape') beta(`scale')
	qui su `varlist'
	ret sca mean=r(mean)
	ret sca cv=r(sd)/r(mean)
end

loc gra 
loc j 1

//for sample size 2000
foreach cv in 0.25 0.5 1 1.5 2{
	mynewprog gamma`j', obs(2000) cov(`cv')
	loc gra `gra' (kdensity gamma`j' if gamma`j'<3000)
	loc ++j
} 

//see the mean and coeff of variation
tabstat _all, stat(mean cv sd)

tw `gra', legend(off) /* nodraw 
*/ name(gamma, replace)

//combine 'em
gr combine lognormal gamma, /* 
*/ cols(1)
*************



HTH
Martin


-----Ursprüngliche Nachricht-----
Von: [email protected]
[mailto:[email protected]] Im Auftrag von Carlo Lazzaro
Gesendet: Samstag, 26. September 2009 16:20
An: [email protected]
Cc: 'Martin Weiss'
Betreff: st: R: RE: odd results after insample

Dear Martin,
thanks a lot for your kind reply.

The approach sketched in my previous message follows the one suggested by:
Briggs, A. and Nixon, R. and Dixon, S. and Thompson, S. (2005) Parametric
modelling of cost data: some simulation evidence. Health Economics 14(4):pp.
421-428.

So far, I have been quite successful with other Stata procedures for drawing
random samples from a given distribution (for instance, -simulate-),
including the approach you kindly advice me about.

Unfortunately, I cannot figure out what went wrong with this last do_file.

Thanks a lot again and enjoy your W_E.

Kind Regards,
Carlo
-----Messaggio originale-----
Da: [email protected]
[mailto:[email protected]] Per conto di Martin Weiss
Inviato: sabato 26 settembre 2009 15.21
A: [email protected]
Oggetto: st: RE: odd results after insample


<>

Just out of curiosity: If you want 20 obs per sample, and 2,000 samples,
should that not lead to 40,000 observations overall?  


HTH
Martin

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Carlo Lazzaro
Sent: Samstag, 26. September 2009 15:00
To: [email protected]
Subject: st: odd results after insample

Dear Statalisters,
as an alternative to - simulate - , I have written the following do file
(for Stata 9.2/SE) to draw 2000 random samples, 20 observations each, from a
normal distribution: 

drop _all
set more off
set obs 2000
obs was 0, now 2000
g double ln_g_20=.
g double ln_sd_g_20=.
set seed 999
qui gen A=5.37 + 1.19*invnorm(uniform()) in 1/972
qui forvalues i = 1(1)2000 {
qui gen ln_20`i'=A
qui generate random`i' = uniform() 
qui sort random`i'
qui generate insample`i' = _n <= 20
qui sum ln_20`i' if insample`i' == 1
replace ln_g_20=r(mean)  in `i'
replace ln_sd_g_20=r(sd) in `i'
drop ln_20`i'
drop random`i' 
drop insample`i'
}
drop A

However, as a result I have obtained 1721 observations instead of the
expected 2000. 

sum ln_g_20 ln_sd_g_20

Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
     ln_g_20 |      1271    5.314033    .3800687    3.79247   6.587941
  ln_sd_g_20 |      1271    1.101084    .2835007   .0260279   2.161299


Besides, results are even more puzzling when I increase the number of
samples (again 20 observations each), in that I get a different number of
observation for ln_g and ln_sd_g.

Comments are gratefully acknowledged.

Thanks a lot for your kindness and for your time.

Kind Regards,
Carlo


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
References:
- st: AW: R: AW: R: RE: odd results after insample
  - From: "Martin Weiss" <[email protected]>
Prev by Date: st: Non-nested clusters
Next by Date: st: Different p-values in outreg2 and regression output
Previous by thread: st: AW: R: AW: R: RE: odd results after insample
Next by thread: st: multiple imputation in Stata 11
Index(es):
- Date
- Thread