Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Venn diagram


From   "James Xiao" <JXiao@americanlegacy.org>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: Venn diagram
Date   Tue, 4 May 2004 17:59:13 -0400

Dear listers,

I'm trying to use venndiag ado-files to draw a two variable venn
diagram. The following is the result I got, but no graph pop out. 

. venndiag currsmk obesity
______________________________________________________________________

Venn diagram of variables: currsmk obesity
File: H:\My Documents\NHIS\2002 NHIS\dataset\samadult.dta ( 4 May 2004 )

     Outcome    Variable and label 
A:       1           currsmk    currsmk
B:       1           obesity    obesity

    31044   Records in file
      330   Records excluded by missing values
    _____
    30714   Records in Diagram: 
Counts for combined variables:
----------------------------------------------------------------------
A      |   5298   17 %  (currsmk == 1)& (obesity != 1)
B      |   6843   22 %  (currsmk != 1)& (obesity == 1)
AB     |   1621    5 %  (currsmk == 1)& (obesity == 1)
--     |  16952   55 %  (currsmk != 1)& (obesity != 1)
----------------------------------------------------------------------
______________________________________________________________________
file C:\DOCUME~1\jxiao\LOCALS~1\Temp\ST_03000016.tmp saved
obs was 0, now 1001
file C:\DOCUME~1\jxiao\LOCALS~1\Temp\ST_03000017.tmp saved
obs was 0, now 1001

Did anybody use this ado-file before? I just got it from stata update.
Any suggestions are highly appreicated!!! 

James Xiao

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of
khigbee@stata.com
Sent: Tuesday, May 04, 2004 5:49 PM
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: About coliearity

Maoyong Fan <fan@berkeley.edu> asks:

> I am running some ordinary least regression now.  There are more than 
> 50 explainary variables in my model.  Most of them are dummy 
> variables.  The Stata will automatically drop the variables randomly 
> if the variable cause colinearity.  However, I want to keep some 
> important variables in my regression and drop only those dummy 
> variables when the important variable and dummy variable cause 
> colinearity.  How can I program to do this?

I hope that someone else will think up a clever solution different from
what I am about to suggest, and I hesitate to show the following
undocumented feature ... but I think it will solve your problem.

I only ask that you (and anyone else venturing along the path I am about
to show) be careful when using the feature.

First some background information.  Stata when determining and
correcting for collinearity (by default) tries to pick one of the
collinear variables to drop based on which drop will provide the most
stable numeric properties.  As Maoyong Fan points out, for near ties in
this criteria (not uncommon in this setting), the choice may differ from
one run to another.

Underneath the hood (or should I say bonnet) Stata is using the sweep
operator on the X'X matrix.  By default with -regress- (etc.), Stata
will (behind the scenes) reorder the rows and columns of the matrix so
that the most stable variables are retained when it comes to a decision
of collinearity (i.e., when it encounters a near zero element on the
diagonal during the sweep).

There is an internal flag that turns off this behavior.  It is used for
-anova- and -manova-, because in those cases we want higher order
interaction indicators (often called dummies) to be dropped before main
effects and lower interactions.  We set up the X'X with the constant
first, the main effects next, then the interactions with highest order
interactions last.  We turn off the flag, and then we sweep.  When done
we turn back on the flag for whatever command that might follow.

It sounds like this is similar to what you want to achieve.

If just before you call your estimation command you enter

    . set debug on
    . set tol r 0
    . set debug off

then the reorder flag will be turned off.  Then you run your estimation
command placing the variables in importance order left to right.  (The
right most colinear vars will be dropped before those further to the
left.)

IMPORTANT ! -- turn the flag back on after you are done.

    . set debug on
    . set tol r 1
    . set debug off

so that Stata will be back to normal behavior.  This is what I meant
when I asked that anyone doing this be careful.  Make sure you get Stata
back to its default behavior.

Ken Higbee    khigbee@stata.com
StataCorp     1-800-STATAPC

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index