Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.

# RE: st: Bootstrapping and predicted probabilities

 From "Mohan, Deepika" To "statalist@hsphsun2.harvard.edu" Subject RE: st: Bootstrapping and predicted probabilities Date Thu, 11 Apr 2013 18:10:00 +0000

```Thank you for the advice. I really appreciate it.
Deepika
________________________________________
From: owner-statalist@hsphsun2.harvard.edu [owner-statalist@hsphsun2.harvard.edu] on behalf of Jeph Herrin [stata@spandrel.net]
Sent: Thursday, April 11, 2013 1:32 PM
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: Bootstrapping and predicted probabilities

On further thought, while you can do the bootstrapping described below or they reply by Steve Samuels, this doesn't
really make sense. bootstrapping is used to for calculating group level statistics - estimating the variance of
parameters of the population. If you had multiple observations per patient, then it would make sense to bootstrap those
observations to get an estimate of the variance of the predicted probability for that patient, but what you propose to
do is to get the predicted probability for a single observation multiple times. It is strange.

Since it seems you want hospital level estimates, you should instead, for each bootstrap sample, calculate the quantity
predicted/expected for each hospital, and collate those for each hospital. I think you'd want to stratify on hospital
when you draw the samples, as well.

hope this helps,
Jeph

On 4/11/2013 9:38 AM, Jeph Herrin wrote:
> My first thought is that you should calculate the predicted and expected from the same model, using -xtmelogit-; this is
> done by calculating the fitted values with and without the random effects. This is, for example, how Medicare itself
> does it when calculating predicted and expected rates for hospitals.
>
> My second thought is that if you do have a reason to use -logit- to get the predicted values, then why not use the
> predicted SEs to construct the CI? I usually do this by simulating p ~ N(xb,SE[xb]) for each observation, calculating
> the inverse logit, and then using the order statistics (this is called parametric bootstrapping, I think).
>
> But to answer your specific question - the only thing you are collecting from your bootstrap is -e(p)-, which is the
> P-value for the chi2 test for the overall model. I think to do what you want (or what you think you want) you can't use
> -bootstrap-, but will need to write your own bootstrap code - the crudest version being one which saves the predictions
> from each sample and which you piece together later.
>
> for b=1/100 {
>   u data, clear
>   bsample
>   logit transfer x1 x2 x3
>   predict p_pred_`b'
>   keep patientid p_pred_`b'
>   save sample`b'
> }
>
> hope this helps,
> Jeph
>
>
> On 4/11/2013 7:48 AM, Mohan, Deepika wrote:
>> Hello, I am trying to figure out how to generate confidence intervals around predicted probabilities at the patient
>> level, using bootstrapping. I am using Stata 12.0 on Windows.
>>
>> I have a Medicare dataset which includes patient-level data, as well as hospital identifiers. The objective is to
>> assess hospital-level variation in the management of trauma patients.
>>
>> I have calculated the expected probability of the outcome (transfer) for each patient:
>
> . logit transfer x1 x2 x3 (where x are patient-level injury characteristics)
> . predict p_exp,
>
>> As well as the predicted probability of the outcome (transfer) for each patient:
>
> . xtmelogit transfer x1 x2 x3 || hospital_id:
> . predict p_pred, mu
>
>>
>> I am now trying to develop confidence intervals around those probabilities, and thought to use the bootstrap command.
>> For example,
>>
>> bootstrap e(p), reps(10) saving(mydata): logit transfer x1 x2 x3
>>
>> However, when I examine the saved data, what I see is a single predicted probability for each repetition and not 10
>> predicted probabilities for each individual. In other words, this command seems to be giving me confidence intervals
>> around the mean predicted probability rather than the predicted probability for the individual patient. Is there some
>> way to do this? I should also add that I don't have the ability to upload user programs like prvalue, since my
>> version of stata is run on a secure desktop (no web-access).
>>
>> Any help would be greatly appreciated,
>>
>> Thanks, Deepika Mohan MD MPH University of Pittsburgh Pittsburgh, PA 15261
>>
>>
>> * *   For searches and help try: *   http://www.stata.com/help.cgi?search *
>> http://www.stata.com/support/faqs/resources/statalist-faq/ *   http://www.ats.ucla.edu/stat/stata/
>>
>>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
>
>
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```