Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: listing groups that differ from predicted results in logit


From   Michael Norman Mitchell <[email protected]>
To   [email protected]
Subject   Re: st: listing groups that differ from predicted results in logit
Date   Wed, 24 Feb 2010 19:44:29 -0800

Dear David

I wonder if you might start by running this as a random intercept model. You could then look at the level two residuals to get a sense of the nature of the distribution of performance, after adjusting for the level 1 predictors. This could also give you a sense of whether there are outliers. However, I am not sure how you could translate this strategy into an actual statistical test.

  Here is some mock code using the "union" data file from Stata...

* use the data
use http://www.stata-press.com/data/r11/union.dta, clear

* idcode is like your hospital id
xtset idcode

* union is the outcome, age and south are level 1 predictors
xtreg union age south

* generate the level 2 residual, naming it r2
predict r2, u

* examine the residuals, for example using a histogram
hist r2

I know this is not a final solution, but I hope it is a useful starting place.

Michael N. Mitchell
See the Stata tidbit of the week at...
http://www.MichaelNormanMitchell.com
Visit me on Facebook at...
http://www.facebook.com/MichaelNormanMitchell

On 2010-02-24 7.25 PM, David Souther wrote:
Hello Statalist:

I've got a dataset of individuals (id) in hospitals (hospitals) with
some individual level data (indiv_var1 and indiv_var2) as well as
hospital level data (hosp_var1 hosp_var2) similar to the data example
below.  I'd like to use the indiv* and hosp* IVs to predict the binary
DV (dv).  In the real dataset there are thousands of hospitals, and
hundreds of individuals per hospital.

What I am hoping to discover is those hospitals that have
significantly higher or significantly lower than expected
probabilities of the DV.  So, I would like to somehow list those
hospitals that are the highest/lowest.   I tried running a logit with
all these as IVs plus dummies for all the hospitals so that I could
use predict to find the difference between the predicted and the
actual values, but it drops all the dummy variables -->
  logit dv indiv* hosp_var1 hosp_var2 hosp_dummies*
  Also, I tried clogit but it said there was no variation in the
groups.  As an alternative, could I just run regression and get the
components that go into the rvfplot (residual versus the fitted
points; if that makes any sense)?

Any other ideas on how to get the hospitals that are highest/lowest ??  Thanks.

**data**
input 	hosp 	id	dv	indiv_var1	indiv_var2	hosp_var1	hosp_var2
	1	1	1	3	34	88	9
	1	2	1	7	24	88	9
	1	3	0	6	12	88	9
	1	4	0	6	12	88	9
	1	5	0	9	12	88	9
	1	6	0	9	13	88	9
	2	1	0	4	66	77	8
	2	2	0	.	67	77	8
	2	3	1	9	68	77	8
	2	4	0	3	67	77	8
	2	5	1	2	6	77	8
	2	6	0	9	56	77	8
	3	1	0	1	34	11	1
	3	2	0	1	3	11	1
	3	3	1	2	2	11	1
	3	4	0	4	1	11	1
	3	5	0	1	2	11	1
	3	6	0	.	1	11	1
end
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index