Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Dependent continuous variable with bounded range


From   "Pavlos C. Symeou" <[email protected]>
To   [email protected]
Subject   Re: st: Dependent continuous variable with bounded range
Date   Tue, 15 Apr 2008 19:19:41 +0100

Dear Nick,

thank you for this. I have tried your suggestion below (to confirm, for the option "link" I use "logit" and for the option "family" I use "binomial"). However, I found no statistical significance in any of the coefficients and after a series of various permutations, it looked to me that the model could not fit the data sufficiently. I therefore returned back to my original random-effects OLS regression whose use you suggest for simplicity reasons. The OLS model's results are also consistent with my theoretical arguments. But still, I need to check whether the predicted values will lie in [0,10]. I have used the command - predict, xb - to save the fitted values in a new variable. The fitted values range from 5.58 to 6.93. The range of values for my observed variable is (2.95 - 8.32). Would this suggest that my model does not suffer from the limitations you note below?

Yours truly,

Pavlos

Nick Cox wrote:

The numeric result for skewness doesn't quite match the fact that the mean is nearer the maximum than the minimum, not that that need that be the case.
You possibly have a bit of a tail of fairly lousy firms, but otherwise this distribution looks quite healthy to me. How about
gen repute = reputation / 10 xtgee repute ..., link(logit) family(<continuous>)
P.S. for "likert" read "Likert" (Rensis Likert, fl.C20)

Pavlos C. Symeou

Dear Nick, Anders, and Jay,

thank you very much for your responses. First, I need to agree with you that this problem can not be handled with interval regression. But as Anders recommends, I need to give further information about my variable in question, "reputation". The variable is a construct consisted of three other variables. These three variables take discrete values that range from 0-10 on a 11-point likert scale, namely 0,1,2,3...,10. These are responses to a survey questionnaire. These three variables' values are first added together and divided by 3 to yield my focal variable "reputation" (unfortunately I only have access to the final variable and not its components). Values for "reputation" closer to 0 suggest lower firm reputation. Values closer to 10 suggest higher firm reputation. A histogram shows a relatively normal distribution as you can see from its Skewness below. I also provide you with statistics on max and min values to respond to Anders' inquiry.

Nick advised that I transform my response so that is unbounded using a link function e.g. logit on response/10. (I am not sure how I can do this) Yet, Jay's suggestion is that this might not be that beneficial. Can you please advise on how to proceed?

Min 2.95 Max 8.32
Mean 6.267978
Variance .5091705
Skewness -.057921
Kurtosis 3.642905

Yours truly,

Pavlos




I find both Nick Cox's and Jay Verkuilen's comments very reasonable.
But Pavlos does not mention how the variable "reputation" is created.
How would reputation get a value of, say, "9.6"? Where does the
boundary 0-10 come from, e.g., from the sample's population or from a
questionnaire
? Is reputation a scored variable or does it represent
original data?

Anders Alexandersson


Nick Cox outlined several strategies, to which I will add just a bit:

(1) How close to the boundaries are your observations? If the distribution looks reasonably symmetric, you probably won't gain much from using a specialized model that "knows" about the boundaries. If you have some skew due to the ceiling or floor but not a truly L- or J-shaped distribution, the logit transformation will probably normalize your errors enough to do ordinary panel regression models. If the distribution is truly L- or J-shaped, no transformation will fix things up.
(2) -betafit- (by Nick, Maarten, et al) with clustered robust SEs is a viable alternative that uses the linking strategy. This uses the beta distribution as an error model. It won't adjust for the autoregressive nature of panel data, though, but maybe that'll work well enough.
(3) If you want to use the GLM approach and are willing to move to different software (SAS or winBUGS), I can give you examples to do random effects and AR-adjusted beta regression. (One of these days, I want to port a random effects and GEE betareg over to Stata; no time right now.)

Jay



This does not sound like an interval regression problem to me. Interval
regression is for when at least some individual values are known as intervals not
points. Here values are known but must lie in a prescribed interval.
Divide by 10 and the problem has exactly the same form as that often met
for
proportions.
There are correspondingly various options:
1. Go ahead regardless. The advantage is simplicity. The disadvantage is
that there is no guarantee that predicted values will lie in [0,10] certainly for some
values of the predictors
and possibly for your observed data. Arguably that is quite wrong in
principle and it may be a real
nuisance for your project. There are implications all the way downstream
in terms of assessing fit.
2. Transform your response so that is unbounded. Model in terms of that
and then back-transform.
3. A variant of 2, and in many ways preferable, to use a link function
e.g. logit on response/10.
4. No doubt others.
Nick
[email protected]
Anders Alexandersson

See -help xtintreg-. It does random-effects interval data regression
models.

Pavlos C. Symeou <[email protected]> wrote:


I have panel data for 100 firms for five years and I want to examine
the

effects of various variables on "reputation". My variable
"reputation"

takes continuous values in the range of 0-10. Namely, it can take
values of 1,2,3 but also of 2.5, 9.6, etc. The values that the
variable

"reputation" takes in my sample range between 2.6 - 8.3. Can you
please

advise if I can still use panel OLS estimation for panel data or
should

I use a different model? In essence, my main concern is the
limitations

of the bounded range of my variable.

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index