|  | 
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: RE: Your opinion on income groups and inflation
Many thanks for your additional view on this! So, there are two things  
to do a) think about a plausible reason why or why not dummies should  
be used (vice versa for the categorical case) and b) test the  
assumption validity with tests such as Richard Williams has promoted.
Many thanks, this has helped me a lot in getting a feel for the  
relevant questions/right approach to these kind of variables!
kind regards,
Andrea
On Jun 9, 2008, at 3:13 AM, Austin Nichols wrote:
Andrea--
I strongly disagree with Martin Weiss, SamL, and Branko milanovic who
claim that an ordered categorical explanatory variable can be included
as a sensible regressor without justification.  Creating dummies *is*
justifiable; you are merely computing conditional means.  Including
income (or "trust") as a single explanatory variable when income (or
"trust") is measured as an ordered categorical explanatory variable
requires a strong assumption that the effect is linear in the index of
categories.  The dummy variable approach requires no such assumption.
As Richard Williams quite rightly points out, you can -test- whether
the effect is linear in the index, or whether groups of individual
dummies all have the same effect.  One useful way is to create dummies
that correspond to more interpretable groups, like above the median,
more than twice the median, less than half the median, etc. so you can
see directly from the regression output where deviations from
linearity occur...  graphs are also helpful for this purpose.
On Sun, Jun 8, 2008 at 4:14 AM, Andrea Bennett <[email protected]>  
wrote:
Many thanks for this revealing illustration of tests! I will  
clearly look
into this...
Kind regards,
Andrea
On Jun 7, 2008, at 9:58 PM, Richard Williams wrote:
At 02:42 PM 6/7/2008, [email protected] wrote:
On income groups (intervals), I would not use dummies because you  
have
information about income _levels_  which would be otherwise lost.  
An
income
interval of 300 to 400, is not the same thing as an income  
interval of
1200 to
3600. Since you do not have information about distribution of  
income
within
Ch. 9 of Long & Freese's book (see especially pp. 421-422) shows  
how to
test whether treating an ordinal variable as interval loses  
information.
Basically, you run an unconstrained model where the ordinal  
variable is
broken up into dummies, and then run a constrained model where you  
treat the
ordinal variable as continuous.  If the difference is not  
significant, then
treating the var as continuous is ok.  I imagine you can tweak  
this a bit,
e.g. assigning midpoints or whatever to the categories of the  
variable.
For info on the book, see
http://www.stata.com/bookstore/regmodcdvs.html
Here is an example:
sysuse auto
reg price rep78
est store constrained
xi: reg price i.rep78
est store unconstrained
lrtest constrained unconstrained
The output from the last part is
. lrtest constrained unconstrained
Likelihood-ratio test                                  LR chi2(3)  =
1.00
(Assumption: constrained nested in unconstrained)      Prob > chi2 =
0.8002
This is kind of a crummy example because the N is so small and the
relationship so weak; but in any event the test says it is ok to  
treat rep78
as continuous.
You can also set it up as a Wald test, which may be handy in  
situations
where a LR test is inappropriate.  If the X variable has k  
categories, then
include X and k-2 of the dummies computed from X, and then test  
the dummies.
e.g.
tab1 rep78, gen(rep)
reg price rep78  rep3 rep4 rep5
test rep3 rep4 rep5
The last command gives
. test rep3 rep4 rep5
( 1)  rep3 = 0
( 2)  rep4 = 0
( 3)  rep5 = 0
    F(  3,    64) =    0.31
         Prob > F =    0.8160
This sort of thing is also useful if, say, your X variable is  
continuous
(e.g. education) but you suspect its effects are not strictly  
linear (a year
of college has a different effect than a year of grade school).
Now, if the N is large, you may well find that the dummy variable  
approach
always comes out ahead.  At that point, you may wish to consider  
substantive
significance (just how much do the effects differ from straight  
linearity?)
or consider some other criteria for assessing significance that  
are less
affected by sample size, e.g. a BIC test.  There is a lot to be  
said for
parsimony.
-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
HOME:   (574)289-5227
EMAIL:  [email protected]
WWW:    http://www.nd.edu/~rwilliam
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/