Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Replicability and -imputw-


From   Richard Williams <[email protected]>
To   [email protected], Stata Help <[email protected]>
Subject   Re: st: Replicability and -imputw-
Date   Sun, 25 Aug 2013 19:26:48 -0500

At 06:07 PM 8/25/2013, Roberto Ferrer wrote:
Richard,

Thank you for your reply. I just posted my solution. I remember
reading that adding -stable- could in some cases obscure other
problems. I think in this case it was safe, but I thought it would
require more computation time (not exactly sure about this, though).

Now that you mention it, I also find interesting that the seed that
was set just before the -sort- doesn't affect it. Maybe someone can
comment on that.

I think Bill Gould already has:

http://blog.stata.com/2012/08/03/using-statas-random-number-generators-part-2-drawing-without-replacement/

"Did you know sort has its own, private random-number generator built into it? It does, and sort uses its random-number generator to determine the order of tied observations. In the manuals we at StataCorp are fond of writing, "the ties will be ordered randomly" and a few sophisticated users probably took that to mean, "the ties will be ordered in a way that we at StataCorp do not know and even though they might be ordered in a way that will cause a bias in the subsequent analysis, because we don't know, we'll ignore the possibility." But we meant it when wrote that the ties will be ordered randomly; we know that because we put a random number generator into sort to ensure the result. And that is why I can now write that repeated values of the runiform() function cause a reproducibility issue, but not a statistical issue."

Further, in the comments section, the question is asked "Can you explain why sort does not use the same seed as the other random number generators? That would make sort also foolproof with respect to reproducibility." Gould has a detailed response. At the end he says "Setting the random-number seed is a way of reproducing results from routines that are intended to produce different results in different runs. -sort- is not such a function; if it produces different results in different runs, and that matters, that is a bug" - where the bug is in the code the user wrote (not in Stata).




Thank you.

Bests,
Roberto

On Mon, Aug 26, 2013 at 12:54 AM, Richard Williams
<[email protected]> wrote:
> I would suggest adding the -stable- option to sort. Or (possibly better)
> have the data sorted before you start calling the program. The latter would
> be a little more efficient in terms of computing time, plus there was some
> sort of thread way back when saying sorting was better if you didn't use the
> stable option (although I don't remember why).
>
> According to the help for sort, "Without the stable option, the ordering of
> observations with equal values of varlist is randomized." I just ran a quick
> quick, and as far as I can tell setting the seed does not cause the same
> random order to occur across multiple calls (which strikes me as odd, but
> maybe there is a reason for it). So, I think sorting the data first or using > the stable option will give you what you want. Please let us know one way or
> the other.
>
>
> At 02:02 PM 8/25/2013, Roberto Ferrer wrote:
>>
>> Hello,
>>
>> I've been using a user-written command -imputw- downloaded from
>>
>> http://fdz.iab.de/187/section.aspx/Publikation/k050719a04
>> Based on Gartner, Herman. "The Imputation of Wages Above the Contribution
>> Limit with the German IAB Employment Sample." FDZ, 2005.
>>
>> My problem is with replicability. I use -set seed- to control for the
>> randomness introduced by the command but I can't manage to obtain the
>> same results for the output variable -lnw_i-. Can anyone please point
>> to source of "uncontrolled randomness" that is affecting the results
>> by inspecting the code?
>>
>> I've double checked, using -cf-, that the data going in is the same
>> for the replication runs. The results for the regressions are the same
>> for all runs (I've checked the log files in a bash terminal (linux)
>> using the program "diff" and they are identical except for log times).
>> But the final resulting variable is not the same for any two runs.
>>
>> I copy the source below since it's not very long and the code snippet
>> I'm running.
>>
>> Thank you.
>>
>> * --------------------- User-written command
>> -------------------------------------
>> program define imputw, byable(recall)
>>
>> version 8
>> syntax varlist [if] , Cens(varlist) Grenze(varlist) [Outvar(string asis)]
>>
>>     marksample touse
>> * If no name given to the output, call it by default "lnw_i".
>>     if "`outvar'" == "" {
>> local outvar "lnw_i"
>>     }
>> * Estimate Tobit model
>> cnreg `varlist' if `touse', censored(`cens')
>> quietly {
>> * Make predictions
>> predict xb00 if `touse'  , xb
>> * Generate standardized limit for each value
>> gen alpha00=(ln(`grenze')-xb00)/_b[_se] if `touse'
>>     }
>>
>> cap gen  `outvar'=.
>> replace `outvar'=`1' if `touse'
>> * Imputation
>> replace `outvar'=xb00+_b[_se] *
>> invnorm(uniform()*(1-norm(alpha00))+norm(alpha00)) if `touse'   &
>> `cens'
>>
>> drop xb00 alpha00
>> end
>>
>> * ------------------- Code I'm using -----------------------------------
>> set seed 391829 // -imputw- uses random number generator
>> sort yearobs size_b
>> by yearobs size_b: imputw lwage frau gebjahr bild esector, cens(censored)
>> ///
>> grenze(uplimit)
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>
>
> -------------------------------------------
> Richard Williams, Notre Dame Dept of Sociology
> OFFICE: (574)631-6668, (574)631-6463
> HOME:   (574)289-5227
> EMAIL:  [email protected]
> WWW:    http://www.nd.edu/~rwilliam
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
HOME:   (574)289-5227
EMAIL:  [email protected]
WWW:    http://www.nd.edu/~rwilliam

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index