Re: st: Dependent continuous variable with bounded range

 From "Anders Alexandersson" <[email protected]> To [email protected] Subject Re: st: Dependent continuous variable with bounded range Date Wed, 16 Apr 2008 23:02:02 -0400

```I (Anders) wrote:
>  "Are we talking about a fractional response on unbalanced panel data?
>  In that case, how about using using -glm ..., family(binomial)
>  link(logit) robust- but separately for each panel? That (or with the
>  probit link) seems to be what Jeffrey Wooldridge suggests at the end
>  of this paper:
>  http://www.msu.edu/~ec/faculty/wooldridge/current%20research/clus1aea.pdf
>  I'm sure he would be happy to clarify this at 2008 SNASUG, if needed :-)

>  we are talking about balanced panel data...I am not sure what you mean by
> fractional responses...As i explained in correspondence the variable in
> question, "reputation", results from the addition of the values of three
> other variables which take discrete values ranging from 0 to 10 and
> thereafter tiehr sum is divided by 3. Therefore, "reputation" as expected,
> has upper and lower bounds the values 10 and 0, respectively. In particular,
> my sample's values range from 2.95 to 8.45.
>
>  Reading through everyone's invaluable responses, which I need to admit have
> given me substantial food for thought, I have construed that the OLS model
> might not be that inappropriate to fit the data after all. That is based on
> that the "reputation" variable manifests a normal distribution; its values
> are quite distant from the bounds; the coefficients of the explanation
> variables appear to support the theoretical arguments; the idiosyncratic
> errors from the - xtreg, fe (or re) -  regression model approach a normal
> distribution; the logit transformation of the "reputation" variable does not
> seem to improve the model; either the -xtgee- or the -glm- commands do not
> yield any satisfactory results.

Wooldridge has another paper on his website that deals with balanced panel data
(i.e., there are no missing values) and that provides more context
http://www.msu.edu/~ec/faculty/wooldridge/fracresp2r4.pdf.
In my limited experience, however, most panel datasets are unbalanced
(i.e., there are missing values).

By fractional response data, I simply mean data where the response
(outcome) variable is a fraction (proportion), as discussed in the
Stata FAQ that Nick mentioned:
http://www.stata.com/support/faqs/stat/logit.html. This FAQ, however,
does not deal with the complication of panel data. Nick suggested to
first divide your response variable reputation, which seemingly has
the theoretical range 0-10, by 10 to get a new fractional response
variable named repute. This new variable repute has the standardized
range 0-1 which covers fractions between 0 and 1 (e.g., the
non-fractional value .1 is the same as the fractional value .1/1).
Other standardized ranges than 0-1 sometimes make more sense. For
example, Celsius uses the range 0-100 rather than 0-1; for what it's
worth, the Swede in me strongly believes that Celsius makes more sense
than, say, Fahrenheit. Does the range 0-10 make more sense than the
range 0-1 for "reputation"?

Is "reputation" a one-dimensional concept that is best measured at the
continuous level as the mean-score of the 3 original variables? Maybe,
but I would not take it for granted. If this would be my analysis, I
would try really hard to get hold of the original raw data to see
empirically what is going on or be cautious about strong conclusions.