# Re: st: Criteria for stratification??

 From "Heather Gold" To statalist@hsphsun2.harvard.edu Subject Re: st: Criteria for stratification?? Date Wed, 16 Jun 2004 16:28:16 -0400

That method of hunting for arbitrary cut points sounds, well, arbitrary and like data mining. I don't know of any "statistical" method to look for ideal cut points. Perhaps if you had an obvious distribution of values (eg, lots of zeroes and lots of ones) you could break the variable into two categories, greater and less than 0.5 or something like that.

At 11:44 AM 6/16/2004, you wrote:

Thanks Cox and Heather. Regarding Cox's comment, this is a EQ5D(European Quality of life) measurement. The result is in a continumm, but it is possible to have a maximum of 243 values in the range of -0.6 to 1.0 (it was mistake writing -1.6 earlier). We are using this EQ5D score as a measure of health effect of two kinds of treatment, we are using markov models to capture the movements between different health states (based on EQ5D scores) in a life time perspective using multi-state life tables.
Regarding Heathers suggestion, Minimal important Difference (MID) or the clinically meaningful threshold, I have been thinking about it, It would be about .1, which will create too many states out of it. But this idea is under consideration, and I am working on it.
The question still is: Is there any statistical methods? what I am thinking is to start with 3 states arbitrarily defining cut points with at least .10 difference between two cutoffs and then lookin at some outcome (?? variance..standard error..)... then continuing to 4..5..6..7. .. I am looking for some outcome which will be used in decision of number of states and cutoffs...
Thank you and hope to get some response

SAMIR
Heather Gold wrote:

If you have some knowledge or information about what increment poses a clinically meaningful threshold, that could be used as a cutpoint (e.g., every half point has an impact on a person's health somehow, and so you create categories such as -1.5-(-1), -1-(-.5), -.5-0, 0-.5, .5-1). Another option is to create "automatically determined" categories such as tertiles but that is arbitrary, so you might want to create, say, tertiles and quartiles and test if creating the categories in that manner makes a difference.

At 10:42 AM 6/16/2004, you wrote:

I have a discrete variable (health status) which takes value from -1.6 to 1. The data consists of pre and post values of the variable. I want to categorize the variable in 3-7 categories in order to study the movement between categories . How can I do this? What Criteria should I use to determine the best categorization, in terms of 1. number of categories and, 2. cutoffs. Please respond.
Samir

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
```
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```
```
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```
```*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```