# Re: st: size of omitted/reference category

David Firth has a little program called QV calculator which allows you to overcome the reference category problem. It is designed to run with R or Xlisp-Stat, but there is also a web-based input form which you can use. See http://www.stats.ox.ac.uk/~firth/qvcalc/index.html

This is more of a stats question that a STATA question --- nonetheless, I hope that you will allow me to pick your brains. I am working on a regression analysis, where the key set of covariates are a series of dummy variables. The most theoretically logical category to omit for the hypotheses we are trying to test is also the smallest. To get handle on the data, the group sizes are shown below:
group 1 = 69
group 2 = 3,636
group 3 = 553
group 4 = 894

Group 1 is the key group that we wish to use as the omitted category and include dummy variables for groups 2 through 4. Does anyone know if there are estimation problems when you use a small reference category? Is this approach legitimate? Does anyone know any citations that I could use as a guide for this?

The only "estimation problem" here is that, other things being equal, differences compared to a smaller reference category will have larger standard errors (and therefore wider confidence limits) than differences from a large reference category. This is why, if the experimenter has any say in the matter, then s/he will often design the experiment so that the reference category is the largest. There is nothing "incorrect" about using a small reference category.

The choice of a reference category is sometimes automated. John Hendrixx's -desmat- package is an alternative to -xi-, and gives the user the option of doing this. In Stata, type

ssc desc desmat

to find out more about -desmat-.

I hope this helps.

