Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Controling precision for multiple runs of same code


From   "[email protected]" <[email protected]>
To   <[email protected]>
Subject   RE: st: Controling precision for multiple runs of same code
Date   Mon, 3 Jun 2013 17:03:28 -0700 (PDT)

Without going into much detail, be aware that a many-to-many merge can yield 
non-deterministic (and often meaningless) pairings of observations, leading to irreproducable or inconsistent results.

Sent with Verizon Mobile Email


---Original Message---
From: [email protected]
Sent: 6/3/2013 7:56 pm
To: "statalist" <[email protected]>
Subject: st: Controling precision for multiple runs of same code

Hello,

I'm having trouble with a section of my code that yields different
results each time I run it.

I start out with a dataset, baseline_4.dta, which has 47,267,047
observations and 16 variables, and run this:

merge m:m statefips agecat_census using "ABCD.dta"
assert _merge==3
drop _merge
egen tot_pop=sum(pop), by(statefips countyfips agecat_census sexcat
racecat iprcat_mpact iprcat coverage groupsize)
checkpop
rename pop oldpop
gen pop=tot_pop*prob_agecat_mpact
checkpop
collapse (sum) pop, by(statefips countyfips agecat_mpact sexcat
racecat iprcat_mpact iprcat coverage groupsize)
checkpop
sum
sort _all
save "baseline_5.dta", replace

checkpop is a program that tells me what my total population is each
time I run it. My total population is the same before and after the
collapse function (see results below).

At the end, my total population and my number of observations in
baseline_5.dta is different every time I run this. I suspect the
difference is in rounding when it !
 executes the gen pop line, but I've
tried replacing it for

gen double pop=tot_pop*prob_agecat_mpact

and

gen float pop=tot_pop*prob_agecat_mpact

And I still get differences.

I tried using

gen long pop=tot_pop*prob_acegat_mpact

But I lost too much precision by doing this.

Could you please recommend a solution to obtain the exact same numbers
in each run, without sacrificing precision?

Thanks!

Melanie

The log file for 2 of the runs I've done:

************* RUN A ***********************

. merge m:m statefips agecat_census using "ABCD.dta"

    Result                           # of obs.
    -----------------------------------------
    not matched                             0
    matched                        47,267,047  (_merge==3)
    -----------------------------------------

. assert _merge==3

. drop _merge

. egen tot_pop=sum(pop), by(statefips countyfips agecat_census sexcat
racecat iprcat_mpac
> t iprcat coverage groupsize)

. checkpop

Total pop:       34!
 7,095,179
Observations:     47,267,047
Missing:                   0

.

 
rename pop oldpop

. gen pop=tot_pop*prob_agecat_mpact

. checkpop

Total pop:       332,455,972
Observations:     47,267,047
Missing:                   0

. collapse (sum) pop, by(statefips countyfips agecat_mpact sexcat
racecat iprcat_mpact ip
> rcat coverage groupsize)

. checkpop

Total pop:       332,455,972
Observations:     36,351,520
Missing:                   0


************** RUN B *************
. merge m:m statefips agecat_census using "ABCD.dta"

    Result                           # of obs.
    -----------------------------------------
    not matched                             0
    matched                        47,267,047  (_merge==3)
    -----------------------------------------

. assert _merge==3

. drop _merge

. egen tot_pop=sum(pop), by(statefips countyfips agecat_census sexcat
racecat iprcat_mpac
> t iprcat coverage groupsize)

. checkpop

Total pop:       347,095,179
Observations:     47,267,047
Missing:                   0

. rename pop oldpop

. !
 gen pop=tot_pop*prob_agecat_mpact

. checkpop

Total pop:       332,455,928
Observations:     47,267,047
Missing:                   0

. collapse (sum) pop, by(statefips countyfips agecat_mpact sexcat
racecat iprcat_mpact ip
> rcat coverage groupsize)

. checkpop

Total pop:       332,455,928
Observations:     36,351,515
Missing:                   0
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index