Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: Drawing from a known, non-regular, discrete distribution

 From Nick Cox To "statalist@hsphsun2.harvard.edu" Subject Re: st: Drawing from a known, non-regular, discrete distribution Date Wed, 19 Feb 2014 12:48:55 +0000

```Something like this?

gen indices = .
mata
share = st_data(., "share")
share = share :/ sum(share)
y = rdiscrete(1000, 1, share)
st_store((1..1000)', "indices", y)
end
gen odo2 = odo[indices]
Nick
njcoxstata@gmail.com

On 19 February 2014 09:20, Lulu Zeng <luluzengnz@gmail.com> wrote:
> Dear Nick and others,
>
> I have 1200 observations in my dataset.
>
> 1200 observations (of variable "share") define the probabilities (add
> up to 1) & 1200 pre-defined corresponding values to be drawn from
> (saved in variable "odo").
>
> I am thinking of having 1000 draws in my sample.
>
> My data looks like below (but with more points). Draw value is
> pre-defined, each of them has a probability attached.
>
>  Draw value     Probability
>
>      0.5                0.15
>
>      0.6                0.30
>
>      0.2                0.25
>
>      0.9                0.30
>
> Thank you for your consideration :)
>
>
> Best Regards,
> Lulu
>
> On Wed, Feb 19, 2014 at 7:59 PM, Nick Cox <njcoxstata@gmail.com> wrote:
>> My own thoughts on "Thanks in advance" are codified in the FAQ.
>> Seemingly no-one agrees with me.
>>
>> I will pose some questions here, but given other commitments I won't
>> be able to respond to any answers until _much_ later today, local
>> time. If someone else picks this up before then, fine by me,
>> naturally!
>>
>> How many observations are in your dataset?
>> How many observations define the probabilities?
>> How many values do you want in your sample?
>>
>> Nick
>> njcoxstata@gmail.com
>>
>>
>>
>> On 19 February 2014 08:51, Lulu Zeng <luluzengnz@gmail.com> wrote:
>>> Dear Nick,
>>>
>>> Sorry that the (1..10)' in my example was a typo, I in fact used 1200
>>> instead of 10 in my real experiment. It didn't work despite so. I also
>>> scaled "share" before calling meta, same error occurs.
>>>
>>> Also, by using -rdiscrete()-, I can see it draws a random number
>>> according to a distribution specified by "p" (and write the random
>>> draws into "odo2" using -st_store()- in my case), but I don't
>>> understand how -rdiscrete()- could draw from a given set of values
>>> (e.g., a pre-specified "odo2" -- this is really what I'm trying to do)
>>>
>>> My apologies if the answer to my question is straight forward, I am
>>> quite new to Meta.
>>>
>>>
>>> Best Regards,
>>> Lulu
>>>
>>>
>>>
>>> On Wed, Feb 19, 2014 at 11:54 AM, Nick Cox <njcoxstata@gmail.com> wrote:
>>>> In my example, I have 10 probabilities in observations 1 to 10 of the
>>>> data, so use
>>>> (1..10)' as an argument. That will make sense for you if and only if
>>>> Nick
>>>> njcoxstata@gmail.com
>>>>
>>>>
>>>> On 19 February 2014 00:09, Lulu Zeng <luluzengnz@gmail.com> wrote:
>>>>> Dear Nick,
>>>>>
>>>>> Thank you for your suggestion. I must have done something incorrectly
>>>>> so mata still gives me the below error despite I did use -p :/ sum(p)-
>>>>> for rescaling as you suggested (I also tried to rescale the original
>>>>> probability variable but neither worked):
>>>>>
>>>>> sum of the probabilities must be 1
>>>>>              rdiscrete():  3300  argument out of range
>>>>>                  <istmt>:     -  function returned error
>>>>> r(3300);
>>>>>
>>>>>
>>>>> My probability variable is "share", and "odo2" is my equivalent of
>>>>> your "y". All I did was:
>>>>>
>>>>> mata
>>>>>
>>>>> p = st_data((1..10)', "share")
>>>>>
>>>>> p :/ sum(p)
>>>>>
>>>>> st_store(., "odo2", rdiscrete(st_nobs(), 1, p))       [this is where
>>>>> the error occurs]
>>>>>
>>>>>
>>>>> My apologies for coming back with the same question again.
>>>>>
>>>>>
>>>>> Best Regards,
>>>>> Lulu
>>>>>
>>>>> On Tue, Feb 18, 2014 at 11:37 PM, Nick Cox <njcoxstata@gmail.com> wrote:
>>>>>> Here is an example of using -rdiscrete()- in Mata. In your case, the
>>>>>> probabilities are already in a variable. If -rdiscrete()- chokes on
>>>>>> small differences in total from 1, then check the probabilities and if
>>>>>> need be scale by -p :/ sum(p)-.
>>>>>>
>>>>>> . clear
>>>>>>
>>>>>> . set obs 1000
>>>>>> obs was 0, now 1000
>>>>>>
>>>>>> . mat p = [0.2,0.2,0.1,0.1,0.1,0.1,0.05,0.05,0.05,0.05]
>>>>>>
>>>>>> . gen double p = p[1,_n]
>>>>>> (990 missing values generated)
>>>>>>
>>>>>> . list in 1/10, sep(0)
>>>>>>
>>>>>>      +-----+
>>>>>>      |   p |
>>>>>>      |-----|
>>>>>>   1. |  .2 |
>>>>>>   2. |  .2 |
>>>>>>   3. |  .1 |
>>>>>>   4. |  .1 |
>>>>>>   5. |  .1 |
>>>>>>   6. |  .1 |
>>>>>>   7. | .05 |
>>>>>>   8. | .05 |
>>>>>>   9. | .05 |
>>>>>>  10. | .05 |
>>>>>>      +-----+
>>>>>>
>>>>>> . gen y = .
>>>>>> (1000 missing values generated)
>>>>>>
>>>>>> . mata
>>>>>> ------------------------------------------------- mata (type end to
>>>>>> exit) ------------------
>>>>>> : p = st_data((1..10)', "p")
>>>>>>
>>>>>> : st_store(., "y", rdiscrete(st_nobs(), 1, p))
>>>>>>
>>>>>> : end
>>>>>> --------------------------------------------------------------------------------------------
>>>>>>
>>>>>> . tab y
>>>>>>
>>>>>>           y |      Freq.     Percent        Cum.
>>>>>> ------------+-----------------------------------
>>>>>>           1 |        202       20.20       20.20
>>>>>>           2 |        200       20.00       40.20
>>>>>>           3 |         98        9.80       50.00
>>>>>>           4 |        102       10.20       60.20
>>>>>>           5 |         87        8.70       68.90
>>>>>>           6 |         99        9.90       78.80
>>>>>>           7 |         49        4.90       83.70
>>>>>>           8 |         54        5.40       89.10
>>>>>>           9 |         53        5.30       94.40
>>>>>>          10 |         56        5.60      100.00
>>>>>> ------------+-----------------------------------
>>>>>>       Total |      1,000      100.00
>>>>>> Nick
>>>>>> njcoxstata@gmail.com
>>>>>>
>>>>>>
>>>>>> On 18 February 2014 09:35, Nick Cox <njcoxstata@gmail.com> wrote:
>>>>>>> The "mapping" (if I am guessing correctly) is in fact trivial as in
>>>>>>> effect your sample would just be the observation numbers.
>>>>>>> Nick
>>>>>>> njcoxstata@gmail.com
>>>>>>>
>>>>>>>
>>>>>>> On 18 February 2014 09:32, Nick Cox <njcoxstata@gmail.com> wrote:
>>>>>>>> Thanks for the details.
>>>>>>>>
>>>>>>>> The Mata function -rdiscrete()- should do most of whar you want. You
>>>>>>>> will need to map your values to integers 1 up and then read in the
>>>>>>>> probabilities so that they are copied from a variable to a vector in
>>>>>>>> Mata. Then select integers and reverse the mapping.
>>>>>>>>
>>>>>>>> Nick
>>>>>>>> njcoxstata@gmail.com
>>>>>>>>
>>>>>>>>
>>>>>>>> On 18 February 2014 09:17, Lulu Zeng <luluzengnz@gmail.com> wrote:
>>>>>>>>> Dear Nick,
>>>>>>>>>
>>>>>>>>> My apologies for the unclear description.
>>>>>>>>>
>>>>>>>>> 1. I have 2 variables in Stata, one variable holds the 1200 known,
>>>>>>>>> discrete values I want to draw; the other holds the corresponding
>>>>>>>>> probabilities.
>>>>>>>>>
>>>>>>>>> 2. The 2 variables are associated with a parameter (attribute) of a
>>>>>>>>> random utility model. I am trying to draw from the distribution of
>>>>>>>>> this parameter of interest, and then divide it by the price parameter
>>>>>>>>> (which similarly has 2 associated variables too) to obtain a
>>>>>>>>> distribution of willingness to pay.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Best Regards,
>>>>>>>>> Lulu
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Feb 18, 2014 at 7:47 PM, Nick Cox <njcoxstata@gmail.com> wrote:
>>>>>>>>>> You have not, so far as I can see, specified
>>>>>>>>>>
>>>>>>>>>> 1. How you are holding information on your distribution. Is it 1200
>>>>>>>>>> known values with associated probabilities (so as two variables in
>>>>>>>>>> Stata), or is the information still outside Stata in some form?
>>>>>>>>>>
>>>>>>>>>> 2. What you expect to draw as a sample.
>>>>>>>>>> Nick
>>>>>>>>>> njcoxstata@gmail.com
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 18 February 2014 03:58, Lulu Zeng <luluzengnz@gmail.com> wrote:
>>>>>>>>>>> Dear Scott,
>>>>>>>>>>>
>>>>>>>>>>> Thank you for your response. My apologies that I am still a little
>>>>>>>>>>> confused about how to do this in my case where I have 1,200
>>>>>>>>>>> observation. Can I still use the cond() command without typing in each
>>>>>>>>>>> point of the draw?
>>>>>>>>>>>
>>>>>>>>>>> Best Regards,
>>>>>>>>>>> Lulu
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Feb 18, 2014 at 1:50 PM, Scott Merryman
>>>>>>>>>>> <scott.merryman@gmail.com> wrote:
>>>>>>>>>>>> http://www.stata.com/statalist/archive/2012-08/msg00256.html
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Scott
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Sun, Feb 16, 2014 at 9:15 PM, Lulu Zeng <luluzengnz@gmail.com> wrote:
>>>>>>>>>>>>> Dear Statalist,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am seeking help with taking draws from a known, non-regular (not
>>>>>>>>>>>>> normal or lognormal etc), discrete distribution.
>>>>>>>>>>>>>
>>>>>>>>>>>>> For example, taking draws from a distribution like the one below.
>>>>>>>>>>>>> However, in my case I have 1,200 points instead of the 4 points given
>>>>>>>>>>>>> in the example.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Draw value     Probability
>>>>>>>>>>>>>
>>>>>>>>>>>>>     0.5                0.15
>>>>>>>>>>>>>
>>>>>>>>>>>>>     0.6                0.30
>>>>>>>>>>>>>
>>>>>>>>>>>>>     0.2                0.25
>>>>>>>>>>>>>
>>>>>>>>>>>>>     0.9                0.30
>>>>>>>>>>>>>
>>>>>>>>>>>>> The "draw value" is the value to be drawn, "probability" is the chance
>>>>>>>>>>>>> each value be drawn, so it adds up to 1.
>>>>>>>>>>>> *
>>>>>>>>>>>> *   For searches and help try:
>>>>>>>>>>>> *   http://www.stata.com/help.cgi?search
>>>>>>>>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>>>>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>>>>>>>> *
>>>>>>>>>>> *   For searches and help try:
>>>>>>>>>>> *   http://www.stata.com/help.cgi?search
>>>>>>>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>>>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>>>>>>> *
>>>>>>>>>> *   For searches and help try:
>>>>>>>>>> *   http://www.stata.com/help.cgi?search
>>>>>>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>>>>>> *
>>>>>>>>> *   For searches and help try:
>>>>>>>>> *   http://www.stata.com/help.cgi?search
>>>>>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>>> *
>>>>>> *   For searches and help try:
>>>>>> *   http://www.stata.com/help.cgi?search
>>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>> *
>>>>> *   For searches and help try:
>>>>> *   http://www.stata.com/help.cgi?search
>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```