Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Controling precision for multiple runs of same code (out of office until12th June)


From   "Seyi Soremekun" <Seyi.Soremekun@lshtm.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: Controling precision for multiple runs of same code (out of office until12th June)
Date   Tue, 04 Jun 2013 01:05:03 +0100

I am currently out of the office until the 12st June with limited email contact.
Please contact Angela Vega (angela.vega@lshtm.ac.uk) for any enquiries.

>>> "kantor.d@att.net" <kantor.d@att.net> 06/04/13 01:03 >>>

Without going into much detail, be aware that a many-to-many merge can yield 
non-deterministic (and often meaningless) pairings of observations, leading to irreproducable or inconsistent results.

Sent with Verizon Mobile Email


---Original Message---
From: statalist@hsphsun2.harvard.edu
Sent: 6/3/2013 7:56 pm
To: "statalist" <statalist@hsphsun2.harvard.edu>
Subject: st: Controling precision for multiple runs of same code

Hello,I'm having trouble with a section of my code that yields differentresults each time I run it.I start out with a dataset, baseline_4.dta, which has 47,267,047observations and 16 variables, and run this:merge m:m statefips agecat_census using "ABCD.dta"assert _merge==3drop _mergeegen tot_pop=sum(pop), by(statefips countyfips agecat_census sexcatracecat iprcat_mpact iprcat coverage groupsize)checkpoprename pop oldpopgen pop=tot_pop*prob_agecat_mpactcheckpopcollapse (sum) pop, by(statefips countyfips agecat_mpact sexcatracecat iprcat_mpact iprcat coverage groupsize)checkpopsumsort _allsave "baseline_5.dta", replacecheckpop is a program that tells me what my total population is eachtime I run it. My total population is the same before and after thecollapse function (see results below).At the end, my total population and my number of observations inbaseline_5.dta is different every time I run this. I suspect thedifference is in rounding when it !
 executes the gen pop line, but I'vetried replacing it forgen double pop=tot_pop*prob_agecat_mpactandgen float pop=tot_pop*prob_agecat_mpactAnd I still get differences.I tried usinggen long pop=tot_pop*prob_acegat_mpactBut I lost too much precision by doing this.Could you please recommend a solution to obtain the exact same numbersin each run, without sacrificing precision?Thanks!MelanieThe log file for 2 of the runs I've done:************* RUN A ***********************. merge m:m statefips agecat_census using "ABCD.dta"    Result                           # of obs.    -----------------------------------------    not matched                             0    matched                        47,267,047  (_merge==3)    -----------------------------------------. assert _merge==3. drop _merge. egen tot_pop=sum(pop), by(statefips countyfips agecat_census sexcatracecat iprcat_mpac> t iprcat coverage groupsize). checkpopTotal pop:       34!
 7,095,179Observations:     47,267,047Missing:                   0.

 
rename pop oldpop. gen pop=tot_pop*prob_agecat_mpact. checkpopTotal pop:       332,455,972Observations:     47,267,047Missing:                   0. collapse (sum) pop, by(statefips countyfips agecat_mpact sexcatracecat iprcat_mpact ip> rcat coverage groupsize). checkpopTotal pop:       332,455,972Observations:     36,351,520Missing:                   0************** RUN B *************. merge m:m statefips agecat_census using "ABCD.dta"    Result                           # of obs.    -----------------------------------------    not matched                             0    matched                        47,267,047  (_merge==3)    -----------------------------------------. assert _merge==3. drop _merge. egen tot_pop=sum(pop), by(statefips countyfips agecat_census sexcatracecat iprcat_mpac> t iprcat coverage groupsize). checkpopTotal pop:       347,095,179Observations:     47,267,047Missing:                   0. rename pop oldpop. !
 gen pop=tot_pop*prob_agecat_mpact. checkpopTotal pop:       332,455,928Observations:     47,267,047Missing:                   0. collapse (sum) pop, by(statefips countyfips agecat_mpact sexcatracecat iprcat_mpact ip> rcat coverage groupsize). checkpopTotal pop:       332,455,928Observations:     36,351,515Missing:                   0**   For searches and help try:*   http://www.stata.com/help.cgi?search*   http://www.stata.com/support/faqs/resources/statalist-faq/*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index