Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Comparing overlapping groups


From   Fred Wolfe <fwolfe@arthritis-research.org>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Comparing overlapping groups
Date   Wed, 3 Oct 2012 04:31:53 -0500

Thanks very much, David.

Fred

On Tue, Oct 2, 2012 at 10:05 AM, David Hoaglin <dchoaglin@gmail.com> wrote:
> Dear Fred,
>
> If the 4 definitions were mutually exclusive subsets, you could use a
> regression that has indicator variables for FM2, FM3, and FM4 (the
> constant term would handle FM1, or you could include an indicator for
> FM1 and turn off the constant).  The result would be equivalent to a
> one-way analysis of variance with 4 groups.
>
> Since the definitions overlap (though you have not said how many of
> the overlaps are present in your data or the numbers of observations
> in the overlaps --- if all 2442 observations meet at least one of the
> 4 definitions, you could have as many as 15 subgroups), you could
> start with a regression model that has indicators for FM2, FM3, and
> FM4.  The constant will give you an average for FM1, and the
> coefficients of the three indicators will give incremental effects,
> relative to FM1.  The results may not be satisfactory, and they may be
> difficult to interpret.  A better approach, along the lines of main
> effects and interactions, would also include indicators for each of
> the subsets that involve 2 or more of the definitions.  Then, for
> example, you could get an estimate of the level of phq_sss among
> people who meet only FM1, an increment for people who meet both FM1
> and FM2, and further increments for people who meet FM1, FM2, and FM3
> and people who meet all 4 definitions.
>
> I hope this discussion is helpful.
>
> David Hoaglin
>
> On Tue, Oct 2, 2012 at 10:06 AM, Fred Wolfe
> <fwolfe@arthritis-research.org> wrote:
>> Dear Statalisters,
>>
>> I am analyzing a medical condition (FM) that has 4 different
>> definitions for the same condition. A person can be in 1 or more of
>> four definition defined groups (FM1, FM2, FM3, FM4). There are 2442
>> observations.
>>
>> I am interested the value of a dependent variable, phq_sss, according
>> to each group definition.
>>
>> For the first two definitions, I get these results
>>
>> . regress phq_sss i.wsp
>>
>>       Source |       SS       df       MS              Number of obs =    2442
>> -------------+------------------------------           F(  1,  2440) =  605.51
>>        Model |  7621.27967     1  7621.27967           Prob > F      =  0.0000
>>     Residual |  30711.1417  2440  12.5865335           R-squared     =  0.1988
>> -------------+------------------------------           Adj R-squared =  0.1985
>>        Total |  38332.4214  2441  15.7035729           Root MSE      =  3.5478
>>
>> ------------------------------------------------------------------------------
>>      phq_sss |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
>> -------------+----------------------------------------------------------------
>>        1.wsp |   6.247731   .2538992    24.61   0.000      5.74985    6.745611
>>        _cons |   2.728905   .0751615    36.31   0.000     2.581518    2.876292
>> ------------------------------------------------------------------------------
>>
>> . regress phq_sss i.mwsp
>>
>>       Source |       SS       df       MS              Number of obs =    2442
>> -------------+------------------------------           F(  1,  2440) =  229.25
>>        Model |  3292.19831     1  3292.19831           Prob > F      =  0.0000
>>     Residual |  35040.2231  2440  14.3607472           R-squared     =  0.0859
>> -------------+------------------------------           Adj R-squared =  0.0855
>>        Total |  38332.4214  2441  15.7035729           Root MSE      =  3.7896
>>
>> ------------------------------------------------------------------------------
>>      phq_sss |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
>> -------------+----------------------------------------------------------------
>>       1.mwsp |   10.37138   .6849863    15.14   0.000     9.028161    11.71459
>>        _cons |   3.144753   .0771774    40.75   0.000     2.993413    3.296093
>> ------------------------------------------------------------------------------
>>
>> There are two additions definitions that are not shown.
>>
>> So the difference for group members as opposed to none groups members
>> in the two analyses above is:
>> wsp  6.2
>> mwsp 10.4
>> (there will be 2 other groups).
>>
>> My question is, how do i tell if the results are statistically
>> different between the 4 groups, given the overlapping membership in
>> the groups. I have a feeling that some sort of permutation test is the
>> way to get such an answer. I'd appreciate suggestions.
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index