Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: using multinomials as weights in bootstrapping


From   Charley Greenwood <[email protected]>
To   "[email protected]" <[email protected]>
Subject   st: RE: using multinomials as weights in bootstrapping
Date   Fri, 6 Sep 2013 07:06:19 +1000

Hello,

I was recently doing some bootstrap procedures with complex processes and calculations inside each bootstrap so I needed to do the resampling myself rather than use a Stata bootstrap function

It was not published work and I had a large sample with time consuming actions within each iteration so rather than select n cases with replacement and save that as a bootstrap dataset multiple times I applied "multinomial weights" to the live dataset within the repetition, so each case was assigned a weights of 0, 1, 2 etc with probability appropriate to a multinomial distribution. I used a uniform random number and relevant threshold values for the multinomial distribution (I only needed to bother with these up to about 20 as the probabilities became infinitesimal after that) - nothing ever got a weight of more than 17 I think.

This gave a sample which had weighted _N fairly close to unweighted _N which I then converged to unweighted N by a few iterations of adjusting the weights through SRS. Where the original unweighted total needed to go up I added to the weight on each iteration, where down I subtracted (unless zero). Doing this seemed a little imperfect, but quicker than a full one-by-one sampling with replacement (_N was over 100,000).

A tricky bit was getting the thresholds right for a specific value of _N. I managed to figure out some formulas that recovered the relevant thresholds for a given value of _N,  starting with the easy one (zero) and then working along 1, 2 etc up to 20. I ran these formula in Stata to set up the thresholds. I only needed to do it once before bootstrapping, as _N obviously doesn't change.

I have three quick questions - all variations on "was I wasting my time?":
* 1, is there a Stata command which returns the multinomial thresholds for a given value of N ?
* 2, is there a command which randomly returns an integer directly from the muiltinomial distribution with parameter N?
* 3, is there a command that creates these sort of weights directly in data (ideally such that weighted _N  = unweighted _N)?

And a more interesting question
* 4, I think my topping up / topping down process corrupts the multinomial weights slightly because they start off as a pure multinomial distribution and then I drive the distribution up or down uniformly across its range (and not uniformly when I drive it reaches zero). I doubt if the damage is great given the small proportion of iterations in correcting. But theoretically I cant decide if doing this is fine or whether it corrupts the distribution. If it's wrong, how wrong is it? Would it be possible to simply reject multinomial draws where the difference is too large or would this itself be biased? I would welcome any feedback / comments.

Many thanks,

Charley Greenwood

Environmental Notice: Please consider the environment before printing this email.

Confidentiality Notice: The content of this message and any attachments may be privileged, in confidence or sensitive. Any unauthorised use is expressly prohibited. If you have received this email in error please notify the sender, disregard and then delete the email. This email may have been corrupted or interfered with. Coffey International Limited cannot guarantee that the message you receive is the same as the message we sent. At Coffey International Limited's discretion we may send a paper copy for confirmation. In the event of any discrepancy between paper and electronic versions the paper version is to take precedence. No warranty is made that this email and its contents are free from computer viruses or other defects.

CILDISCL0005

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index