Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Replicability and -imputw-


From   Richard Williams <[email protected]>
To   [email protected], Stata Help <[email protected]>
Subject   Re: st: Replicability and -imputw-
Date   Sun, 25 Aug 2013 19:53:36 -0500

At 06:42 PM 8/25/2013, Roberto Ferrer wrote:
Richard, nice finding. Thank you for taking the time.

Obviously you and I should both read the Stata Blog much more carefully. ;-) However, I think many/most people would assume that setting the seed would also guarantee the same sort order. A few words in the documentation that that is not the case might be helpful.


Bests,
Roberto

On Mon, Aug 26, 2013 at 1:26 AM, Richard Williams
<[email protected]> wrote:
> At 06:07 PM 8/25/2013, Roberto Ferrer wrote:
>>
>> Richard,
>>
>> Thank you for your reply. I just posted my solution. I remember
>> reading that adding -stable- could in some cases obscure other
>> problems. I think in this case it was safe, but I thought it would
>> require more computation time (not exactly sure about this, though).
>>
>> Now that you mention it, I also find interesting that the seed that
>> was set just before the -sort- doesn't affect it. Maybe someone can
>> comment on that.
>
>
> I think Bill Gould already has:
>
> http://blog.stata.com/2012/08/03/using-statas-random-number-generators-part-2-drawing-without-replacement/
>
> "Did you know sort has its own, private random-number generator built into
> it? It does, and sort uses its random-number generator to determine the
> order of tied observations. In the manuals we at StataCorp are fond of
> writing, "the ties will be ordered randomly" and a few sophisticated users
> probably took that to mean, "the ties will be ordered in a way that we at
> StataCorp do not know and even though they might be ordered in a way that
> will cause a bias in the subsequent analysis, because we don't know, we'll
> ignore the possibility." But we meant it when wrote that the ties will be
> ordered randomly; we know that because we put a random number generator into
> sort to ensure the result. And that is why I can now write that repeated
> values of the runiform() function cause a reproducibility issue, but not a
> statistical issue."
>
> Further, in the comments section, the question is asked "Can you explain why
> sort does not use the same seed as the other random number generators? That
> would make sort also foolproof with respect to reproducibility." Gould has a
> detailed response. At the end he says "Setting the random-number seed is a
> way of reproducing results from routines that are intended to produce
> different results in different runs. -sort- is not such a function; if it
> produces different results in different runs, and that matters, that is a
> bug" - where the bug is in the code the user wrote (not in Stata).
>
>
>
>
>
>> Thank you.
>>
>> Bests,
>> Roberto
>>
>> On Mon, Aug 26, 2013 at 12:54 AM, Richard Williams
>> <[email protected]> wrote:
>> > I would suggest adding the -stable- option to sort. Or (possibly better)
>> > have the data sorted before you start calling the program. The latter
>> > would
>> > be a little more efficient in terms of computing time, plus there was
>> > some
>> > sort of thread way back when saying sorting was better if you didn't use
>> > the
>> > stable option (although I don't remember why).
>> >
>> > According to the help for sort, "Without the stable option, the ordering
>> > of
>> > observations with equal values of varlist is randomized." I just ran a
>> > quick
>> > quick, and as far as I can tell setting the seed does not cause the same
>> > random order to occur across multiple calls (which strikes me as odd,
>> > but
>> > maybe there is a reason for it). So, I think sorting the data first or
>> > using
>> > the stable option will give you what you want. Please let us know one
>> > way or
>> > the other.
>> >
>> >
>> > At 02:02 PM 8/25/2013, Roberto Ferrer wrote:
>> >>
>> >> Hello,
>> >>
>> >> I've been using a user-written command -imputw- downloaded from
>> >>
>> >> http://fdz.iab.de/187/section.aspx/Publikation/k050719a04
>> >> Based on Gartner, Herman. "The Imputation of Wages Above the
>> >> Contribution
>> >> Limit with the German IAB Employment Sample." FDZ, 2005.
>> >>
>> >> My problem is with replicability. I use -set seed- to control for the
>> >> randomness introduced by the command but I can't manage to obtain the
>> >> same results for the output variable -lnw_i-. Can anyone please point
>> >> to source of "uncontrolled randomness" that is affecting the results
>> >> by inspecting the code?
>> >>
>> >> I've double checked, using -cf-, that the data going in is the same
>> >> for the replication runs. The results for the regressions are the same
>> >> for all runs (I've checked the log files in a bash terminal (linux)
>> >> using the program "diff" and they are identical except for log times).
>> >> But the final resulting variable is not the same for any two runs.
>> >>
>> >> I copy the source below since it's not very long and the code snippet
>> >> I'm running.
>> >>
>> >> Thank you.
>> >>
>> >> * --------------------- User-written command
>> >> -------------------------------------
>> >> program define imputw, byable(recall)
>> >>
>> >> version 8
>> >> syntax varlist [if] , Cens(varlist) Grenze(varlist) [Outvar(string
>> >> asis)]
>> >>
>> >>     marksample touse
>> >> * If no name given to the output, call it by default "lnw_i".
>> >>     if "`outvar'" == "" {
>> >> local outvar "lnw_i"
>> >>     }
>> >> * Estimate Tobit model
>> >> cnreg `varlist' if `touse', censored(`cens')
>> >> quietly {
>> >> * Make predictions
>> >> predict xb00 if `touse'  , xb
>> >> * Generate standardized limit for each value
>> >> gen alpha00=(ln(`grenze')-xb00)/_b[_se] if `touse'
>> >>     }
>> >>
>> >> cap gen  `outvar'=.
>> >> replace `outvar'=`1' if `touse'
>> >> * Imputation
>> >> replace `outvar'=xb00+_b[_se] *
>> >> invnorm(uniform()*(1-norm(alpha00))+norm(alpha00)) if `touse'   &
>> >> `cens'
>> >>
>> >> drop xb00 alpha00
>> >> end
>> >>
>> >> * ------------------- Code I'm using
>> >> -----------------------------------
>> >> set seed 391829 // -imputw- uses random number generator
>> >> sort yearobs size_b
>> >> by yearobs size_b: imputw lwage frau gebjahr bild esector,
>> >> cens(censored)
>> >> ///
>> >> grenze(uplimit)
>> >> *
>> >> *   For searches and help try:
>> >> *   http://www.stata.com/help.cgi?search
>> >> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> >> *   http://www.ats.ucla.edu/stat/stata/
>> >
>> >
>> > -------------------------------------------
>> > Richard Williams, Notre Dame Dept of Sociology
>> > OFFICE: (574)631-6668, (574)631-6463
>> > HOME:   (574)289-5227
>> > EMAIL:  [email protected]
>> > WWW:    http://www.nd.edu/~rwilliam
>> >
>> >
>> > *
>> > *   For searches and help try:
>> > *   http://www.stata.com/help.cgi?search
>> > *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> > *   http://www.ats.ucla.edu/stat/stata/
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>
>
> -------------------------------------------
> Richard Williams, Notre Dame Dept of Sociology
> OFFICE: (574)631-6668, (574)631-6463
> HOME:   (574)289-5227
> EMAIL:  [email protected]
> WWW:    http://www.nd.edu/~rwilliam
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
HOME:   (574)289-5227
EMAIL:  [email protected]
WWW:    http://www.nd.edu/~rwilliam

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index