Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: listing groups that differ from predicted results in logit

From	Michael Norman Mitchell <[email protected]>
To	[email protected]
Subject	Re: st: listing groups that differ from predicted results in logit
Date	Thu, 25 Feb 2010 11:03:01 -0800

Dear David

I am short on time, here is one thought for the significance tests ofeach hospital against the grand mean... you could use "deviation coding"for the hospitals. This is described at


http://www.ats.ucla.edu/stat/stata/webbooks/reg/chapter5/statareg5.htm#DEVIATION

The admitted challenge is creating some kind of loop to create all ofthe deviation codes. You could also use "xi3" (see findit xi3) forautomating that process. "xi3" can create the codes, but it may not playwell as a prefix command to an "xt" command.


I hope that helps,

Michael N. Mitchell
See the Stata tidbit of the week at...
http://www.MichaelNormanMitchell.com
Visit me on Facebook at...
http://www.facebook.com/MichaelNormanMitchell

On 2010-02-25 7.02 AM, David Souther wrote:

Thanks Michael.  I'm still not sure how this would let me compare
among hospitals, or idcodes, because how do I know which residuals are
significantly different from the rest.

I found another reference that I think is similar to your approach, it
talks about a LSDV (least squares dummy reg) for comparing the other
dummies to the mean group effect.

http://www.masil.org/documents/dummy.pdf

However, it mentions that there is no straightforward way to run this
model in Stata (only in SAS, limdep, R, or others).  It suggests that
you can use something like

xtreg dv iv, fe i(groupvar)

but I'm not sure if this is appropriate for my model.  If I run this
on the data example you gave me, it gives me slightly larger residuals
but once again, I'm not sure how to translate that into a comparison
of the higher vs. lower than predicted hospitals?


On Wed, Feb 24, 2010 at 9:44 PM, Michael Norman Mitchell
<[email protected]>  wrote:

Dear David

  I wonder if you might start by running this as a random intercept model.
You could then look at the level two residuals to get a sense of the nature
of the distribution of performance, after adjusting for the level 1
predictors. This could also give you a sense of whether there are outliers.
However, I am not sure how you could translate this strategy into an actual
statistical test.

  Here is some mock code using the "union" data file from Stata...

* use the data
use http://www.stata-press.com/data/r11/union.dta, clear

* idcode is like your hospital id
xtset idcode

* union is the outcome, age and south are level 1 predictors
xtreg union age south

* generate the level 2 residual, naming it r2
predict r2, u

* examine the residuals, for example using a histogram
hist r2

  I know this is not a final solution, but I hope it is a useful starting
place.

Michael N. Mitchell
See the Stata tidbit of the week at...
http://www.MichaelNormanMitchell.com
Visit me on Facebook at...
http://www.facebook.com/MichaelNormanMitchell

On 2010-02-24 7.25 PM, David Souther wrote:

Hello Statalist:

I've got a dataset of individuals (id) in hospitals (hospitals) with
some individual level data (indiv_var1 and indiv_var2) as well as
hospital level data (hosp_var1 hosp_var2) similar to the data example
below.  I'd like to use the indiv* and hosp* IVs to predict the binary
DV (dv).  In the real dataset there are thousands of hospitals, and
hundreds of individuals per hospital.

What I am hoping to discover is those hospitals that have
significantly higher or significantly lower than expected
probabilities of the DV.  So, I would like to somehow list those
hospitals that are the highest/lowest.   I tried running a logit with
all these as IVs plus dummies for all the hospitals so that I could
use predict to find the difference between the predicted and the
actual values, but it drops all the dummy variables -->
  logit dv indiv* hosp_var1 hosp_var2 hosp_dummies*
  Also, I tried clogit but it said there was no variation in the
groups.  As an alternative, could I just run regression and get the
components that go into the rvfplot (residual versus the fitted
points; if that makes any sense)?

Any other ideas on how to get the hospitals that are highest/lowest ??
  Thanks.

**data**
input   hosp    id      dv      indiv_var1      indiv_var2      hosp_var1
       hosp_var2
        1       1       1       3       34      88      9
        1       2       1       7       24      88      9
        1       3       0       6       12      88      9
        1       4       0       6       12      88      9
        1       5       0       9       12      88      9
        1       6       0       9       13      88      9
        2       1       0       4       66      77      8
        2       2       0       .       67      77      8
        2       3       1       9       68      77      8
        2       4       0       3       67      77      8
        2       5       1       2       6       77      8
        2       6       0       9       56      77      8
        3       1       0       1       34      11      1
        3       2       0       1       3       11      1
        3       3       1       2       2       11      1
        3       4       0       4       1       11      1
        3       5       0       1       2       11      1
        3       6       0       .       1       11      1
end
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: listing groups that differ from predicted results in logit
  - From: David Souther <[email protected]>
- Re: st: listing groups that differ from predicted results in logit
  - From: Michael Norman Mitchell <[email protected]>
- Re: st: listing groups that differ from predicted results in logit
  - From: David Souther <[email protected]>

Prev by Date: Re: st: test for clustering in instrumental variables settings
Next by Date: st: SVAR Impulse-Response
Previous by thread: Re: st: listing groups that differ from predicted results in logit
Next by thread: Re: st: listing groups that differ from predicted results in logit
Index(es):
- Date
- Thread