Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Trying to simulate sampling distribution of mean


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Trying to simulate sampling distribution of mean
Date   Wed, 30 Jan 2013 02:03:01 +0000

The -simulate- call would need to be revised to pick up r(mean).


On Wed, Jan 30, 2013 at 1:30 AM, Nick Cox <njcoxstata@gmail.com> wrote:
> If you want to do it this way, you can simplify your program
>
> program ybar
>          qui use big.dta, clear
>          sample 60, count
>          su age, meanonly
> end
>
> I think that should still work. -syntax- does nothing for you.
> -summarize- leaves r(mean) in its wake any way. Taking a variable and
> putting it in another and taking a saved result and putting it in
> another can both be excised.
>
> Nick
>
>
> On Wed, Jan 30, 2013 at 12:04 AM, krishanu karmakar
> <krishkarmakar@gmail.com> wrote:
>> Thank you Dr. Cox,
>>
>> I did a little bit more searching and with the help of your answer I
>> modified my -ybar- program as follows
>>
>> -----------------------------
>> program define ybar, rclass
>>         syntax [,]
>>         qui use big.dta, clear
>>         sample 60, count
>>         gen y1 = age
>>         summ y1
>>         return scalar my = r(mean)
>> end
>>
>> local reps 5
>> simulate rmy=r(my), saving(sdistmean`i', replace) nodots reps(`reps'): ybar
>> -----------------------------------
>> yes, I should probably put the -use- command as an option to the
>> -ybar- program to make it more generally usable. But, otherwise, it is
>> now working as i wanted it to.
>>
>> Thank you again.
>> Krishanu
>>
>>
>> On Tue, Jan 29, 2013 at 6:51 PM, Nick Cox <njcoxstata@gmail.com> wrote:
>>> Your program -ybar- does exactly the same thing every time, so
>>> inevitably the results are the same. If you look again at the help for
>>> -simulate- you will see that the example program -lnsim- includes its
>>> own random variate generation. Conversely, you do use -sample 0.1- but
>>> you use it outside your program.
>>>
>>> Otherwise put, -simulate- does not actually do stochastic simulation;
>>> it is just a framework that runs and collates the results of a program
>>> you write -- and that program must do the simulation
>>>
>>> In your case, there is an easy way of getting random samples from your
>>> dataset. Just chop the dataset into blocks randomly and summarize each
>>> block. .
>>>
>>> If you shuffle your data
>>>
>>> set seed 2803
>>> gen random = runiform()
>>> sort random
>>>
>>> and create blocks of size 100
>>>
>>> gen block = ceil(_n/100)
>>>
>>> then
>>>
>>> egen mean = mean(age), by(block)
>>> egen tag = tag(block)
>>> l mean if tag
>>>
>>> that will give you 1000 means each for blocks of size 100. For some
>>> reason, it seems that you only want 5, and that means you can throw
>>> 995 away.
>>>
>>> Nick
>>>
>>> On Tue, Jan 29, 2013 at 11:15 PM, krishanu karmakar
>>> <krishkarmakar@gmail.com> wrote:
>>>
>>>> The following is my code
>>>>
>>>> ==== code start =====
>>>>
>>>> program define ybar, rclass
>>>>         syntax [,]
>>>>         replace y1 = y2
>>>>         summarize y1
>>>>         return scalar m_y = r(mean)
>>>> end
>>>>
>>>>
>>>> local reps 5
>>>>
>>>>         quietly use big.dta, clear
>>>>         generate y2 = age
>>>>         sample 0.1
>>>>
>>>>         quietly{
>>>>         gen y1=.
>>>>         simulate m_age=r(m_y), saving(meandata, replace) nodots reps(`reps'): ybar
>>>> }
>>>>
>>>> ==== code ends =====
>>>>
>>>> What I am trying to do.
>>>> I have a dataset named "big.dta" with 100,000 observations. The only
>>>> variable in this dataset is "age".
>>>>
>>>> I want to first draw a sample of size 100 from this dataset and
>>>> calculate the mean for the variable "age". I want to draw 5 such
>>>> samples and store the mean of "age" from each sample as the variable
>>>> "m_age" in a new dataset called "meandata". So this dataset will have
>>>> 5 observations.
>>>>
>>>> My code is running, but wrongly. I am getting stata to save the
>>>> "meandata", but all the five observations (mean of age from 5
>>>> different samples) are stored as equal in value. That means stata is
>>>> not drawing 5 different samples, but only one sample. Could anyone
>>>> help by showing which line my code should I change?
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index