Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Dependent continuous variable with bounded range


From   "Anders Alexandersson" <andersalex@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Dependent continuous variable with bounded range
Date   Wed, 16 Apr 2008 23:02:02 -0400

I (Anders) wrote:
>  "Are we talking about a fractional response on unbalanced panel data?
>  In that case, how about using using -glm ..., family(binomial)
>  link(logit) robust- but separately for each panel? That (or with the
>  probit link) seems to be what Jeffrey Wooldridge suggests at the end
>  of this paper:
>  http://www.msu.edu/~ec/faculty/wooldridge/current%20research/clus1aea.pdf
>  I'm sure he would be happy to clarify this at 2008 SNASUG, if needed :-)

Pavlos answered:
>  we are talking about balanced panel data...I am not sure what you mean by
> fractional responses...As i explained in correspondence the variable in
> question, "reputation", results from the addition of the values of three
> other variables which take discrete values ranging from 0 to 10 and
> thereafter tiehr sum is divided by 3. Therefore, "reputation" as expected,
> has upper and lower bounds the values 10 and 0, respectively. In particular,
> my sample's values range from 2.95 to 8.45.
>
>  Reading through everyone's invaluable responses, which I need to admit have
> given me substantial food for thought, I have construed that the OLS model
> might not be that inappropriate to fit the data after all. That is based on
> that the "reputation" variable manifests a normal distribution; its values
> are quite distant from the bounds; the coefficients of the explanation
> variables appear to support the theoretical arguments; the idiosyncratic
> errors from the - xtreg, fe (or re) -  regression model approach a normal
> distribution; the logit transformation of the "reputation" variable does not
> seem to improve the model; either the -xtgee- or the -glm- commands do not
> yield any satisfactory results.

Wooldridge has another paper on his website that deals with balanced panel data
(i.e., there are no missing values) and that provides more context
about "fractional response" data, see
http://www.msu.edu/~ec/faculty/wooldridge/fracresp2r4.pdf.
In my limited experience, however, most panel datasets are unbalanced
(i.e., there are missing values).

By fractional response data, I simply mean data where the response
(outcome) variable is a fraction (proportion), as discussed in the
Stata FAQ that Nick mentioned:
http://www.stata.com/support/faqs/stat/logit.html. This FAQ, however,
does not deal with the complication of panel data. Nick suggested to
first divide your response variable reputation, which seemingly has
the theoretical range 0-10, by 10 to get a new fractional response
variable named repute. This new variable repute has the standardized
range 0-1 which covers fractions between 0 and 1 (e.g., the
non-fractional value .1 is the same as the fractional value .1/1).
Other standardized ranges than 0-1 sometimes make more sense. For
example, Celsius uses the range 0-100 rather than 0-1; for what it's
worth, the Swede in me strongly believes that Celsius makes more sense
than, say, Fahrenheit. Does the range 0-10 make more sense than the
range 0-1 for "reputation"?

Is "reputation" a one-dimensional concept that is best measured at the
continuous level as the mean-score of the 3 original variables? Maybe,
but I would not take it for granted. If this would be my analysis, I
would try really hard to get hold of the original raw data to see
empirically what is going on or be cautious about strong conclusions.

Anders Alexandersson
andersalex@gmail.com
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index