Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Comparing overlapping groups

 From David Hoaglin To statalist@hsphsun2.harvard.edu Subject Re: st: Comparing overlapping groups Date Tue, 2 Oct 2012 11:05:54 -0400

Dear Fred,

If the 4 definitions were mutually exclusive subsets, you could use a
regression that has indicator variables for FM2, FM3, and FM4 (the
constant term would handle FM1, or you could include an indicator for
FM1 and turn off the constant).  The result would be equivalent to a
one-way analysis of variance with 4 groups.

Since the definitions overlap (though you have not said how many of
the overlaps are present in your data or the numbers of observations
in the overlaps --- if all 2442 observations meet at least one of the
4 definitions, you could have as many as 15 subgroups), you could
start with a regression model that has indicators for FM2, FM3, and
FM4.  The constant will give you an average for FM1, and the
coefficients of the three indicators will give incremental effects,
relative to FM1.  The results may not be satisfactory, and they may be
difficult to interpret.  A better approach, along the lines of main
effects and interactions, would also include indicators for each of
the subsets that involve 2 or more of the definitions.  Then, for
example, you could get an estimate of the level of phq_sss among
people who meet only FM1, an increment for people who meet both FM1
and FM2, and further increments for people who meet FM1, FM2, and FM3
and people who meet all 4 definitions.

I hope this discussion is helpful.

David Hoaglin

On Tue, Oct 2, 2012 at 10:06 AM, Fred Wolfe
<fwolfe@arthritis-research.org> wrote:
> Dear Statalisters,
>
> I am analyzing a medical condition (FM) that has 4 different
> definitions for the same condition. A person can be in 1 or more of
> four definition defined groups (FM1, FM2, FM3, FM4). There are 2442
> observations.
>
> I am interested the value of a dependent variable, phq_sss, according
> to each group definition.
>
> For the first two definitions, I get these results
>
> . regress phq_sss i.wsp
>
>       Source |       SS       df       MS              Number of obs =    2442
> -------------+------------------------------           F(  1,  2440) =  605.51
>        Model |  7621.27967     1  7621.27967           Prob > F      =  0.0000
>     Residual |  30711.1417  2440  12.5865335           R-squared     =  0.1988
> -------------+------------------------------           Adj R-squared =  0.1985
>        Total |  38332.4214  2441  15.7035729           Root MSE      =  3.5478
>
> ------------------------------------------------------------------------------
>      phq_sss |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
> -------------+----------------------------------------------------------------
>        1.wsp |   6.247731   .2538992    24.61   0.000      5.74985    6.745611
>        _cons |   2.728905   .0751615    36.31   0.000     2.581518    2.876292
> ------------------------------------------------------------------------------
>
> . regress phq_sss i.mwsp
>
>       Source |       SS       df       MS              Number of obs =    2442
> -------------+------------------------------           F(  1,  2440) =  229.25
>        Model |  3292.19831     1  3292.19831           Prob > F      =  0.0000
>     Residual |  35040.2231  2440  14.3607472           R-squared     =  0.0859
> -------------+------------------------------           Adj R-squared =  0.0855
>        Total |  38332.4214  2441  15.7035729           Root MSE      =  3.7896
>
> ------------------------------------------------------------------------------
>      phq_sss |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
> -------------+----------------------------------------------------------------
>       1.mwsp |   10.37138   .6849863    15.14   0.000     9.028161    11.71459
>        _cons |   3.144753   .0771774    40.75   0.000     2.993413    3.296093
> ------------------------------------------------------------------------------
>
> There are two additions definitions that are not shown.
>
> So the difference for group members as opposed to none groups members
> in the two analyses above is:
> wsp  6.2
> mwsp 10.4
> (there will be 2 other groups).
>
> My question is, how do i tell if the results are statistically
> different between the 4 groups, given the overlapping membership in
> the groups. I have a feeling that some sort of permutation test is the
> way to get such an answer. I'd appreciate suggestions.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index