Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
David Hoaglin <dchoaglin@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Comparing overlapping groups |

Date |
Tue, 2 Oct 2012 11:05:54 -0400 |

Dear Fred, If the 4 definitions were mutually exclusive subsets, you could use a regression that has indicator variables for FM2, FM3, and FM4 (the constant term would handle FM1, or you could include an indicator for FM1 and turn off the constant). The result would be equivalent to a one-way analysis of variance with 4 groups. Since the definitions overlap (though you have not said how many of the overlaps are present in your data or the numbers of observations in the overlaps --- if all 2442 observations meet at least one of the 4 definitions, you could have as many as 15 subgroups), you could start with a regression model that has indicators for FM2, FM3, and FM4. The constant will give you an average for FM1, and the coefficients of the three indicators will give incremental effects, relative to FM1. The results may not be satisfactory, and they may be difficult to interpret. A better approach, along the lines of main effects and interactions, would also include indicators for each of the subsets that involve 2 or more of the definitions. Then, for example, you could get an estimate of the level of phq_sss among people who meet only FM1, an increment for people who meet both FM1 and FM2, and further increments for people who meet FM1, FM2, and FM3 and people who meet all 4 definitions. I hope this discussion is helpful. David Hoaglin On Tue, Oct 2, 2012 at 10:06 AM, Fred Wolfe <fwolfe@arthritis-research.org> wrote: > Dear Statalisters, > > I am analyzing a medical condition (FM) that has 4 different > definitions for the same condition. A person can be in 1 or more of > four definition defined groups (FM1, FM2, FM3, FM4). There are 2442 > observations. > > I am interested the value of a dependent variable, phq_sss, according > to each group definition. > > For the first two definitions, I get these results > > . regress phq_sss i.wsp > > Source | SS df MS Number of obs = 2442 > -------------+------------------------------ F( 1, 2440) = 605.51 > Model | 7621.27967 1 7621.27967 Prob > F = 0.0000 > Residual | 30711.1417 2440 12.5865335 R-squared = 0.1988 > -------------+------------------------------ Adj R-squared = 0.1985 > Total | 38332.4214 2441 15.7035729 Root MSE = 3.5478 > > ------------------------------------------------------------------------------ > phq_sss | Coef. Std. Err. t P>|t| [95% Conf. Interval] > -------------+---------------------------------------------------------------- > 1.wsp | 6.247731 .2538992 24.61 0.000 5.74985 6.745611 > _cons | 2.728905 .0751615 36.31 0.000 2.581518 2.876292 > ------------------------------------------------------------------------------ > > . regress phq_sss i.mwsp > > Source | SS df MS Number of obs = 2442 > -------------+------------------------------ F( 1, 2440) = 229.25 > Model | 3292.19831 1 3292.19831 Prob > F = 0.0000 > Residual | 35040.2231 2440 14.3607472 R-squared = 0.0859 > -------------+------------------------------ Adj R-squared = 0.0855 > Total | 38332.4214 2441 15.7035729 Root MSE = 3.7896 > > ------------------------------------------------------------------------------ > phq_sss | Coef. Std. Err. t P>|t| [95% Conf. Interval] > -------------+---------------------------------------------------------------- > 1.mwsp | 10.37138 .6849863 15.14 0.000 9.028161 11.71459 > _cons | 3.144753 .0771774 40.75 0.000 2.993413 3.296093 > ------------------------------------------------------------------------------ > > There are two additions definitions that are not shown. > > So the difference for group members as opposed to none groups members > in the two analyses above is: > wsp 6.2 > mwsp 10.4 > (there will be 2 other groups). > > My question is, how do i tell if the results are statistically > different between the 4 groups, given the overlapping membership in > the groups. I have a feeling that some sort of permutation test is the > way to get such an answer. I'd appreciate suggestions. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Comparing overlapping groups***From:*Fred Wolfe <fwolfe@arthritis-research.org>

**References**:**st: Comparing overlapping groups***From:*Fred Wolfe <fwolfe@arthritis-research.org>

- Prev by Date:
**Re: st: Heckman inclusion extra variables in the outcome equation** - Next by Date:
**st: Re: table to latex** - Previous by thread:
**st: Comparing overlapping groups** - Next by thread:
**Re: st: Comparing overlapping groups** - Index(es):