Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Kim Peeters <kimpeeters84@yahoo.com> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: Binary panel data questions |
Date | Thu, 9 Feb 2012 09:27:52 -0800 (PST) |
Dear Maarten, Thank you for your reply. Concerning your data preparation / quality remark, it turns out that the data is correct. The ailment is not very common and once you suffer from it, it is very unlikely that it will ever cure. In the meantime I fitted two different models. First model: standard logistic regression including a time factor variable and clustered standard errors, allowing for intra-patient correlation Logistic regression Number of obs = 4526 Wald chi2(21) = 62.63 Prob > chi2 = 0.0000 Log pseudolikelihood = -2889.4078 Pseudo R2 = 0.0690 (Std. Err. adjusted for 588 clusters in ID) ----------------------------------------------------------------------------------------------- | Robust Profitstatus | Coef. Std. Err. z P>|z| [95% Conf. Interval] ------------------------------+---------------------------------------------------------------- Year | 1995 | 0 (empty) 1996 | .3095819 .6289854 0.49 0.623 -.9232068 1.542371 1997 | .2287845 .3932461 0.58 0.561 -.5419637 .9995326 1998 | .2779752 .2760959 1.01 0.314 -.2631629 .8191133 1999 | .2198992 .2423209 0.91 0.364 -.2550409 .6948394 2000 | .2776958 .1964845 1.41 0.158 -.1074067 .6627984 2001 | .173692 .1671147 1.04 0.299 -.1538467 .5012308 2002 | -.0233154 .1418964 -0.16 0.869 -.3014272 .2547964 2003 | .0028641 .1155645 0.02 0.980 -.2236381 .2293663 2004 | .0281098 .0998883 0.28 0.778 -.1676678 .2238873 2005 | .0220868 .0823186 0.27 0.788 -.1392547 .1834282 2006 | .0470962 .0740874 0.64 0.525 -.0981124 .1923049 2007 | .008058 .0702793 0.11 0.909 -.1296869 .1458029 2008 | .0484251 .0671299 0.72 0.471 -.083147 .1799971 2009 | .0380139 .0655851 0.58 0.562 -.0905306 .1665584 2010 | 0 (omitted) | X | 2 | 1.20977 .3477654 3.48 0.001 .5281622 1.891377 3 | .7152767 .287351 2.49 0.013 .152079 1.278474 4 | .1813765 .2763467 0.66 0.512 -.3602532 .7230061 5 | 0 (empty) 6 | .750882 .3379602 2.22 0.026 .0884923 1.413272 | Y | .2927133 .0971447 3.01 0.003 .1023131 .4831135 Z | -.9795072 .3057005 -3.20 0.001 -1.578669 -.3803452 A | -1.525683 .3984367 -3.83 0.000 -2.306604 -.7447611 _cons | .5056934 .3791922 1.33 0.182 -.2375097 1.248896 ----------------------------------------------------------------------------------------------- Note: 1 failure and 3 successes completely determined. note: 1995.Year != 0 predicts success perfectly 1995.Year dropped and 2 obs not used note: 5.X != 0 predicts failure perfectly 5.X dropped and 322 obs not used note: 2010.Year omitted because of collinearity Second model: -xtlogit- with random effects Random-effects logistic regression Number of obs = 4850 Group variable: ID Number of groups = 624 Random effects u_i ~ Gaussian Obs per group: min = 2 avg = 7.8 max = 16 Wald chi2(8) = 37.88 Log likelihood = -379.48407 Prob > chi2 = 0.0000 ----------------------------------------------------------------------------------------------- Profitstatus | Coef. Std. Err. z P>|z| [95% Conf. Interval] ------------------------------+---------------------------------------------------------------- X | 2 | 1.952878 1.480678 1.32 0.187 -.9491981 4.854953 3 | 2.070424 1.322714 1.57 0.118 -.5220471 4.662896 4 | .2901392 1.324056 0.22 0.827 -2.304962 2.885241 5 | -55.3464 4613.941 -0.01 0.990 -9098.504 8987.812 6 | 3.294871 2.974562 1.11 0.268 -2.535163 9.124905 | Y | .4409673 .1010158 4.37 0.000 .2429799 .6389547 Z | -1.308088 1.035164 -1.26 0.206 -3.336972 .7207964 A | -3.993116 2.400293 -1.66 0.096 -8.697604 .7113715 _cons | -.0651275 2.014896 -0.03 0.974 -4.01425 3.883995 ------------------------------+---------------------------------------------------------------- /lnsig2u | 4.505505 .116975 4.276238 4.734772 ------------------------------+---------------------------------------------------------------- sigma_u | 9.513887 .5564434 8.483466 10.66946 rho | .9649282 .0039586 .9562861 .971912 ----------------------------------------------------------------------------------------------- Likelihood-ratio test of rho=0: chibar2(01) = 5026.14 Prob >= chibar2 = 0.000 In the standard logistic regression, variables Y, Z and A are significant. However, in the random-effects panel data regression, only the Y variable is significant. The X variable result is also different. I did not expect the models to vary that much. Why are these models so different or am I doing something wrong? Thank you! Kind regards, Kim ----- Original Message ----- From: Maarten Buis <maartenlbuis@gmail.com> To: statalist@hsphsun2.harvard.edu Cc: Sent: Wednesday, February 8, 2012 10:27 AM Subject: Re: st: Binary panel data questions On Wed, Feb 8, 2012 at 1:19 AM, Kim Peeters wrote: > Somewhat remarkably, it turns out that none of the participants in the study experienced a transition from one state to the other state (e.g. transition from no ailment to ailment and vice versa). In other words, all patients that did not suffer from the illness at the onset of the study remained disease-free and all patients that did suffer from the illness at the onset of the study continued to be ill. > > Originally, I planned to use -xtlogit- with fixed effects to control for unobserved influences that differ between patients but remain constant in a given patient. However, since none of patients experienced a transition, Stata correctly returns error code 2000: outcome does not vary in any group. > > At the moment, I do not know which statistical technique would be the most appropriate. Recall that I try to test for a relationship between the outcome (no illness vs. illness) and a group of independent variables. I thought about running a logistic regression with clustered standard errors (i.e. vce(cluster ID)). However, I do not want to discard the time dimension in the panel data and I would to correct for potential omitted variable bias. In essence you do not have panel data, you could just as well use the first observation in each person and do a regular -logit-. I just don't think there is any more information present in your data, and no amount of fancy modeling can invent information that isn't present in the data. I would really check again whether that constant disease status isn't some error during data preparation or some artifact of the way the data was collected, as that a) sounds really suspicious and b) is causing you this problem. Hope this helps, Maarten -------------------------- Maarten L. Buis Institut fuer Soziologie Universitaet Tuebingen Wilhelmstrasse 36 72074 Tuebingen Germany http://www.maartenbuis.nl -------------------------- * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/