Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

Re: st: Trying to simulate sampling distribution of mean

 From krishanu karmakar To statalist@hsphsun2.harvard.edu Subject Re: st: Trying to simulate sampling distribution of mean Date Wed, 30 Jan 2013 00:15:31 -0500

```Thank you, for all the help.

Krishanu

On Tue, Jan 29, 2013 at 9:03 PM, Nick Cox <njcoxstata@gmail.com> wrote:
> The -simulate- call would need to be revised to pick up r(mean).
>
>
> On Wed, Jan 30, 2013 at 1:30 AM, Nick Cox <njcoxstata@gmail.com> wrote:
>> If you want to do it this way, you can simplify your program
>>
>> program ybar
>>          qui use big.dta, clear
>>          sample 60, count
>>          su age, meanonly
>> end
>>
>> I think that should still work. -syntax- does nothing for you.
>> -summarize- leaves r(mean) in its wake any way. Taking a variable and
>> putting it in another and taking a saved result and putting it in
>> another can both be excised.
>>
>> Nick
>>
>>
>> On Wed, Jan 30, 2013 at 12:04 AM, krishanu karmakar
>> <krishkarmakar@gmail.com> wrote:
>>> Thank you Dr. Cox,
>>>
>>> I did a little bit more searching and with the help of your answer I
>>> modified my -ybar- program as follows
>>>
>>> -----------------------------
>>> program define ybar, rclass
>>>         syntax [,]
>>>         qui use big.dta, clear
>>>         sample 60, count
>>>         gen y1 = age
>>>         summ y1
>>>         return scalar my = r(mean)
>>> end
>>>
>>> local reps 5
>>> simulate rmy=r(my), saving(sdistmean`i', replace) nodots reps(`reps'): ybar
>>> -----------------------------------
>>> yes, I should probably put the -use- command as an option to the
>>> -ybar- program to make it more generally usable. But, otherwise, it is
>>> now working as i wanted it to.
>>>
>>> Thank you again.
>>> Krishanu
>>>
>>>
>>> On Tue, Jan 29, 2013 at 6:51 PM, Nick Cox <njcoxstata@gmail.com> wrote:
>>>> Your program -ybar- does exactly the same thing every time, so
>>>> inevitably the results are the same. If you look again at the help for
>>>> -simulate- you will see that the example program -lnsim- includes its
>>>> own random variate generation. Conversely, you do use -sample 0.1- but
>>>> you use it outside your program.
>>>>
>>>> Otherwise put, -simulate- does not actually do stochastic simulation;
>>>> it is just a framework that runs and collates the results of a program
>>>> you write -- and that program must do the simulation
>>>>
>>>> In your case, there is an easy way of getting random samples from your
>>>> dataset. Just chop the dataset into blocks randomly and summarize each
>>>> block. .
>>>>
>>>> If you shuffle your data
>>>>
>>>> set seed 2803
>>>> gen random = runiform()
>>>> sort random
>>>>
>>>> and create blocks of size 100
>>>>
>>>> gen block = ceil(_n/100)
>>>>
>>>> then
>>>>
>>>> egen mean = mean(age), by(block)
>>>> egen tag = tag(block)
>>>> l mean if tag
>>>>
>>>> that will give you 1000 means each for blocks of size 100. For some
>>>> reason, it seems that you only want 5, and that means you can throw
>>>> 995 away.
>>>>
>>>> Nick
>>>>
>>>> On Tue, Jan 29, 2013 at 11:15 PM, krishanu karmakar
>>>> <krishkarmakar@gmail.com> wrote:
>>>>
>>>>> The following is my code
>>>>>
>>>>> ==== code start =====
>>>>>
>>>>> program define ybar, rclass
>>>>>         syntax [,]
>>>>>         replace y1 = y2
>>>>>         summarize y1
>>>>>         return scalar m_y = r(mean)
>>>>> end
>>>>>
>>>>>
>>>>> local reps 5
>>>>>
>>>>>         quietly use big.dta, clear
>>>>>         generate y2 = age
>>>>>         sample 0.1
>>>>>
>>>>>         quietly{
>>>>>         gen y1=.
>>>>>         simulate m_age=r(m_y), saving(meandata, replace) nodots reps(`reps'): ybar
>>>>> }
>>>>>
>>>>> ==== code ends =====
>>>>>
>>>>> What I am trying to do.
>>>>> I have a dataset named "big.dta" with 100,000 observations. The only
>>>>> variable in this dataset is "age".
>>>>>
>>>>> I want to first draw a sample of size 100 from this dataset and
>>>>> calculate the mean for the variable "age". I want to draw 5 such
>>>>> samples and store the mean of "age" from each sample as the variable
>>>>> "m_age" in a new dataset called "meandata". So this dataset will have
>>>>> 5 observations.
>>>>>
>>>>> My code is running, but wrongly. I am getting stata to save the
>>>>> "meandata", but all the five observations (mean of age from 5
>>>>> different samples) are stored as equal in value. That means stata is
>>>>> not drawing 5 different samples, but only one sample. Could anyone
>>>>> help by showing which line my code should I change?
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

--