Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Correlation coefficient between discrete and continuous variables


From   "Austin Nichols" <[email protected]>
To   [email protected]
Subject   Re: st: Correlation coefficient between discrete and continuous variables
Date   Thu, 20 Nov 2008 13:38:19 -0500

Sergiy--
Might the -oprobit- command do what you want?

Maybe someone with more ordered probit expertise can comment on this example:

sysuse auto, clear
center rep78 price, c s
qui reg c_*
di _b[c_price]
qui corr rep78 price
di r(rho)
oprobit c_rep78 c_price
oprobit rep78 c_price

(-center- is from SSC).

On 11/20/08, Sergiy Radyakin <[email protected]> wrote:
> Dear All,
>
> a colleague of mine has just hinted me that it may not be
> straightforward to compute a correlation coefficient when one of the
> variables is discrete. Until now I never cared, and neither does the
> Stata manual. In particular it does not require anywhere the variables
> to be continuous, and the example shows the use of -correlate- command
> to find a correlation between such discrete variables as -state- and
> -region- and such continuous variables as -marriage rate-, -divorce
> rate- (which is also strange since there is no logical ordering of
> -state- and -region-, but that is a different issue).
>
> After looking into the literature, the following paper seems to be
> most relevant:
>
>   N.R.Cox "Estimation of the Correlation between a Continuous and a
> Discrete Variable", Biometrics, Vol.30, No.1 (Mar., 1974), pp. 171-178
>   www.jstor.org/stable/2529626
>
> In particular my case satisfies the assumptions made in the paper that
> the discrete value is derived from an underlying continuous variable
> (so there is ordering: low, medium, or high).The way it is recommended
> in the paper seems very far away from what Stata seems to be computing
> according to the manual, in particular it calls for iterative maximum
> likelihood estimation.
>
> Before I start writing any code myself, I would like to ask:
>
> Q1: does Stata do any adjustment to the way it computes the
> correlation coefficient based on the nature of the variable (discrete
> or continuous)?
>
> Q2: is the difference between (the correlation coefficient as
> estimated by Stata in this case) and (the one computed by the
> recommended way) practically important?
>
> Q3: is there any standard or user-written command to compute the
> correlation coefficient according to the method described in the paper
> above?
>
> Q4:I am ultimately interested in the correlation between my observed
> continuous variable and the unobserved continuous variable, which is
> represented in the discrete levels. Unfortunately the thresholds are
> not available to me, so I may not be sure about the size of the
> intervals. Furthermore, a significant measurement error may be
> involved, since many interviewers may have eyeballed the continuous
> variable into different groups differently. Should I instead focus on
> different measures of correlation? Could you please suggest any ones
> that better fit the context?
>
> Thank you,
>   Sergiy Radyakin
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index