Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: Drawing from a known, non-regular, discrete distribution

 From Nick Cox To "statalist@hsphsun2.harvard.edu" Subject Re: st: Drawing from a known, non-regular, discrete distribution Date Wed, 19 Feb 2014 08:59:19 +0000

```My own thoughts on "Thanks in advance" are codified in the FAQ.
Seemingly no-one agrees with me.

I will pose some questions here, but given other commitments I won't
be able to respond to any answers until _much_ later today, local
time. If someone else picks this up before then, fine by me,
naturally!

How many observations are in your dataset?
How many observations define the probabilities?
How many values do you want in your sample?

Nick
njcoxstata@gmail.com

On 19 February 2014 08:51, Lulu Zeng <luluzengnz@gmail.com> wrote:
> Dear Nick,
>
> Sorry that the (1..10)' in my example was a typo, I in fact used 1200
> instead of 10 in my real experiment. It didn't work despite so. I also
> scaled "share" before calling meta, same error occurs.
>
> Also, by using -rdiscrete()-, I can see it draws a random number
> according to a distribution specified by "p" (and write the random
> draws into "odo2" using -st_store()- in my case), but I don't
> understand how -rdiscrete()- could draw from a given set of values
> (e.g., a pre-specified "odo2" -- this is really what I'm trying to do)
>
> My apologies if the answer to my question is straight forward, I am
> quite new to Meta.
>
>
> Best Regards,
> Lulu
>
>
>
> On Wed, Feb 19, 2014 at 11:54 AM, Nick Cox <njcoxstata@gmail.com> wrote:
>> In my example, I have 10 probabilities in observations 1 to 10 of the
>> data, so use
>> (1..10)' as an argument. That will make sense for you if and only if
>> Nick
>> njcoxstata@gmail.com
>>
>>
>> On 19 February 2014 00:09, Lulu Zeng <luluzengnz@gmail.com> wrote:
>>> Dear Nick,
>>>
>>> Thank you for your suggestion. I must have done something incorrectly
>>> so mata still gives me the below error despite I did use -p :/ sum(p)-
>>> for rescaling as you suggested (I also tried to rescale the original
>>> probability variable but neither worked):
>>>
>>> sum of the probabilities must be 1
>>>              rdiscrete():  3300  argument out of range
>>>                  <istmt>:     -  function returned error
>>> r(3300);
>>>
>>>
>>> My probability variable is "share", and "odo2" is my equivalent of
>>> your "y". All I did was:
>>>
>>> mata
>>>
>>> p = st_data((1..10)', "share")
>>>
>>> p :/ sum(p)
>>>
>>> st_store(., "odo2", rdiscrete(st_nobs(), 1, p))       [this is where
>>> the error occurs]
>>>
>>>
>>> My apologies for coming back with the same question again.
>>>
>>>
>>> Best Regards,
>>> Lulu
>>>
>>> On Tue, Feb 18, 2014 at 11:37 PM, Nick Cox <njcoxstata@gmail.com> wrote:
>>>> Here is an example of using -rdiscrete()- in Mata. In your case, the
>>>> probabilities are already in a variable. If -rdiscrete()- chokes on
>>>> small differences in total from 1, then check the probabilities and if
>>>> need be scale by -p :/ sum(p)-.
>>>>
>>>> . clear
>>>>
>>>> . set obs 1000
>>>> obs was 0, now 1000
>>>>
>>>> . mat p = [0.2,0.2,0.1,0.1,0.1,0.1,0.05,0.05,0.05,0.05]
>>>>
>>>> . gen double p = p[1,_n]
>>>> (990 missing values generated)
>>>>
>>>> . list in 1/10, sep(0)
>>>>
>>>>      +-----+
>>>>      |   p |
>>>>      |-----|
>>>>   1. |  .2 |
>>>>   2. |  .2 |
>>>>   3. |  .1 |
>>>>   4. |  .1 |
>>>>   5. |  .1 |
>>>>   6. |  .1 |
>>>>   7. | .05 |
>>>>   8. | .05 |
>>>>   9. | .05 |
>>>>  10. | .05 |
>>>>      +-----+
>>>>
>>>> . gen y = .
>>>> (1000 missing values generated)
>>>>
>>>> . mata
>>>> ------------------------------------------------- mata (type end to
>>>> exit) ------------------
>>>> : p = st_data((1..10)', "p")
>>>>
>>>> : st_store(., "y", rdiscrete(st_nobs(), 1, p))
>>>>
>>>> : end
>>>> --------------------------------------------------------------------------------------------
>>>>
>>>> . tab y
>>>>
>>>>           y |      Freq.     Percent        Cum.
>>>> ------------+-----------------------------------
>>>>           1 |        202       20.20       20.20
>>>>           2 |        200       20.00       40.20
>>>>           3 |         98        9.80       50.00
>>>>           4 |        102       10.20       60.20
>>>>           5 |         87        8.70       68.90
>>>>           6 |         99        9.90       78.80
>>>>           7 |         49        4.90       83.70
>>>>           8 |         54        5.40       89.10
>>>>           9 |         53        5.30       94.40
>>>>          10 |         56        5.60      100.00
>>>> ------------+-----------------------------------
>>>>       Total |      1,000      100.00
>>>> Nick
>>>> njcoxstata@gmail.com
>>>>
>>>>
>>>> On 18 February 2014 09:35, Nick Cox <njcoxstata@gmail.com> wrote:
>>>>> The "mapping" (if I am guessing correctly) is in fact trivial as in
>>>>> effect your sample would just be the observation numbers.
>>>>> Nick
>>>>> njcoxstata@gmail.com
>>>>>
>>>>>
>>>>> On 18 February 2014 09:32, Nick Cox <njcoxstata@gmail.com> wrote:
>>>>>> Thanks for the details.
>>>>>>
>>>>>> The Mata function -rdiscrete()- should do most of whar you want. You
>>>>>> will need to map your values to integers 1 up and then read in the
>>>>>> probabilities so that they are copied from a variable to a vector in
>>>>>> Mata. Then select integers and reverse the mapping.
>>>>>>
>>>>>> Nick
>>>>>> njcoxstata@gmail.com
>>>>>>
>>>>>>
>>>>>> On 18 February 2014 09:17, Lulu Zeng <luluzengnz@gmail.com> wrote:
>>>>>>> Dear Nick,
>>>>>>>
>>>>>>> My apologies for the unclear description.
>>>>>>>
>>>>>>> 1. I have 2 variables in Stata, one variable holds the 1200 known,
>>>>>>> discrete values I want to draw; the other holds the corresponding
>>>>>>> probabilities.
>>>>>>>
>>>>>>> 2. The 2 variables are associated with a parameter (attribute) of a
>>>>>>> random utility model. I am trying to draw from the distribution of
>>>>>>> this parameter of interest, and then divide it by the price parameter
>>>>>>> (which similarly has 2 associated variables too) to obtain a
>>>>>>> distribution of willingness to pay.
>>>>>>>
>>>>>>>
>>>>>>> Best Regards,
>>>>>>> Lulu
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Feb 18, 2014 at 7:47 PM, Nick Cox <njcoxstata@gmail.com> wrote:
>>>>>>>> You have not, so far as I can see, specified
>>>>>>>>
>>>>>>>> 1. How you are holding information on your distribution. Is it 1200
>>>>>>>> known values with associated probabilities (so as two variables in
>>>>>>>> Stata), or is the information still outside Stata in some form?
>>>>>>>>
>>>>>>>> 2. What you expect to draw as a sample.
>>>>>>>> Nick
>>>>>>>> njcoxstata@gmail.com
>>>>>>>>
>>>>>>>>
>>>>>>>> On 18 February 2014 03:58, Lulu Zeng <luluzengnz@gmail.com> wrote:
>>>>>>>>> Dear Scott,
>>>>>>>>>
>>>>>>>>> Thank you for your response. My apologies that I am still a little
>>>>>>>>> confused about how to do this in my case where I have 1,200
>>>>>>>>> observation. Can I still use the cond() command without typing in each
>>>>>>>>> point of the draw?
>>>>>>>>>
>>>>>>>>> Best Regards,
>>>>>>>>> Lulu
>>>>>>>>>
>>>>>>>>> On Tue, Feb 18, 2014 at 1:50 PM, Scott Merryman
>>>>>>>>> <scott.merryman@gmail.com> wrote:
>>>>>>>>>> http://www.stata.com/statalist/archive/2012-08/msg00256.html
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Scott
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Sun, Feb 16, 2014 at 9:15 PM, Lulu Zeng <luluzengnz@gmail.com> wrote:
>>>>>>>>>>> Dear Statalist,
>>>>>>>>>>>
>>>>>>>>>>> I am seeking help with taking draws from a known, non-regular (not
>>>>>>>>>>> normal or lognormal etc), discrete distribution.
>>>>>>>>>>>
>>>>>>>>>>> For example, taking draws from a distribution like the one below.
>>>>>>>>>>> However, in my case I have 1,200 points instead of the 4 points given
>>>>>>>>>>> in the example.
>>>>>>>>>>>
>>>>>>>>>>> Draw value     Probability
>>>>>>>>>>>
>>>>>>>>>>>     0.5                0.15
>>>>>>>>>>>
>>>>>>>>>>>     0.6                0.30
>>>>>>>>>>>
>>>>>>>>>>>     0.2                0.25
>>>>>>>>>>>
>>>>>>>>>>>     0.9                0.30
>>>>>>>>>>>
>>>>>>>>>>> The "draw value" is the value to be drawn, "probability" is the chance
>>>>>>>>>>> each value be drawn, so it adds up to 1.
>>>>>>>>>> *
>>>>>>>>>> *   For searches and help try:
>>>>>>>>>> *   http://www.stata.com/help.cgi?search
>>>>>>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>>>>>> *
>>>>>>>>> *   For searches and help try:
>>>>>>>>> *   http://www.stata.com/help.cgi?search
>>>>>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>>>>> *
>>>>>>>> *   For searches and help try:
>>>>>>>> *   http://www.stata.com/help.cgi?search
>>>>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>>>> *
>>>>>>> *   For searches and help try:
>>>>>>> *   http://www.stata.com/help.cgi?search
>>>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```