Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: RE: Output summary stats before doing other iteration


From   Larraine Becker <lbecker@unimelb.edu.au>
To   statalist@hsphsun2.harvard.edu
Subject   RE: st: RE: Output summary stats before doing other iteration
Date   Tue, 16 Sep 2008 15:49:46 +1000

Eva and Martin,

Thanks for all the help.  I've got something working!  

Larraine

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Eva Poen
Sent: Monday, 15 September 2008 9:59 PM
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: RE: Output summary stats before doing other iteration

Larraine,

ok, so you want to see if people who previously have had a c-section
are more likely to have one again. And your strategy to do that is to

- randomly replace some of your data with ones where there have been
zeros in the variable for "previous c-section";
- estimate a logit model, with the dependent variable c-section;
- predict probability;
- repeat.
Did I get that right? I couldn't quite see from your very first
postings how the variables you replace affect the regression; you need
to tell us the full -logit- command you are using.

I'm not quite sure I understand what you need the randomised element
for, though. If, in a regression model, you have c-section as
dependent variable, and previous c-section as regressor, then the
marginal effect for your previous c-section dummy is what you are
after. Note that this marginal effect is non-linear and will depend on
the values of all other covariates. Also, are there any women in your
sample who gave birth for the first time? In your random replacements,
you'd assign some of them the value of one when they cannot possibly
have had a previous c-section. I'm not sure that will make sense for
your interpretations. Someone with expertise in medical statistics
will be able to help you out here.

If you really want to go down the simulation route, then -simulate- is
the way to go, as Martin said. The manual entry should get you a long
way. However, I don't quite see what you can learn from this
excercise. You'd get 1000 estimation results, and what these are will
highly depend on the fraction of data you replace, among other things.
How do you then interpret your 1000 estimations of the coefficient on
previous c-section?

You will really need to show us the -logit- command before we can help
any further, I think.

Eva

2008/9/15 Larraine Becker <lbecker@unimelb.edu.au>:
> Thanks Eva.
>
> Sorry for HTML posting - I've hopefully changed this.  Please let me
> know if it didn't.
>
> Thanks for the advice - I'll try to answer and explain a bit more to
> clarify what I am trying to do.
>
> First, there is no -if- statement in my logit regression.
>
> Basically I'm trying to see if people who had caesareans are more
likely
> to have another caesarean.  So, I'm randomly replacing the number of
> people who had previous caesareans and then predicting future
> caesareans.  I want to do this many times and just use the average.
>
> Not sure if this makes sense?  I'm basically the "programmer", so it's
> not my project - I need to just clarify a few things with the project
> leader as well.
>
> But, my main question is just trying to repeat this procedure more
than
> once, and providing me with the predicted values and the mean of all
> observations for each iteration.  I can then later on calculate the
> overall mean.
>
> I will try what you've suggested - if you have any further ideas,
please
> let me know.
>
> Thanks,
> Larraine
>
>
>
>
>
> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu
> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Eva Poen
> Sent: Friday, 12 September 2008 5:31 PM
> To: statalist@hsphsun2.harvard.edu
> Subject: Re: st: RE: Output summary stats before doing other iteration
>
> Larraine,
>
> please don't use html for your postings to statalist (see the advice
> in the faq).
>
> It would be interesting to see the -if- statement in your logit
> regression. Is it -if dropouts==1-? If I understand correctly, your
> procedure is
> - generate random variable for sorting
> - run logistic regression on a (random) subsample of your data, which
> excludes all observations that have been used in previous iterations
> - generate predicted values for _all_ observations in the dataset.
>
> First, whenever you work with random numbers, you should set the
> random number seed in order to make your results reproducible:
>
> set seed 123
>
> for example.
> Next, it seems perfectly sufficient to save the estimation result
> during the loop. You can use
>
> estimates store iteration`i'
>
> within your loop, and then generate predicted values as and if you
need
> them:
>
> estimates for iteration723 : predict onehat723
>
> See -help estimates-. Do you really want to generate 1000 variables
> with predicted values? If you tell us what you ultimately want to
> achive, we might be able to suggest something more suitable. It looks
> as if you are doing a simulation exercise of some sort; there might be
> a more direct way.
>
> Hope this helps,
> Eva
>
>
> 2008/9/12 Larraine Becker <lbecker@unimelb.edu.au>:
>> Just a correction, the program below is wrong...it should be:
>>
>>
>>
>> forvalues i=1(1)10 {
>>
>> use "U:\CS\combined dataset_2006.dta", clear
>>
>> generate random`i' = uniform()
>>
>> sort anyprevcs random`i'
>>
>> generate dropouts = 0
>>
>> replace dropouts =1 if anyprevcs==1 & (_N - _n) < 3575
>>
>>
>>
>> logit .......(I've deleted the variables, as there are too many to
put
> here!)
>>
>>
>>
>> replace anyprevcs=0 if dropouts==1
>>
>> predict onehat`i'
>>
>> summarize onehat`i'
>>
>> drop dropouts n
>>
>> }
>>
>>
>>
>>
>>
>> ________________________________
>>
>> From: owner-statalist@hsphsun2.harvard.edu
>> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Larraine
> Becker
>> Sent: Friday, 12 September 2008 4:35 PM
>> To: statalist@hsphsun2.harvard.edu
>> Subject: st: Output summary stats before doing other iteration
>>
>>
>>
>> Hi all,
>>
>>
>>
>> I'm doing 1000 iterations of a logistic regression.  I have to output
> the
>> predicted value each time before it carries on with the next
> iteration,
>> otherwise I lose the
>>
>> first 999 predicted values!  I'm sure there is a way to go about
this,
> but
>> how can I save the predicted value each time so I end up with a table
> with
>> 1000 predicted values?
>>
>>
>>
>> My program is as follows:
>>
>>
>>
>> forvalues i=1(1)10 {
>>
>> use "U:\CS\combined dataset_2006.dta", clear
>>
>> generate random`i' = uniform()
>>
>> sort anyprevcs random`i'
>>
>> generate dropouts = 0
>>
>> replace dropouts =1 if anyprevcs==1 & (_N - _n) < 3575
>>
>>
>>
>> logit .......(I've deleted the variables, as there are too many to
put
> here!)
>>
>>
>>
>> replace anyprevcs=0 if dropouts==1
>>
>> predict onehat`i'
>>
>> summarize onehat`i'
>>
>> gen n=_n
>>
>> egen predicted`i'=mean(onehat`i')*n
>>
>> drop dropouts n
>>
>> }
>>
>>
>>
>> Thanks,
>>
>> Larraine
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index