# RE: st: regression with dependent variable ranging from 0 to 1

 From "Feiveson, Alan H. (JSC-SK311)"
Date Wed, 31 Dec 2008 12:14:45 -0600

```Andrea -

Yours is indeed an interesting problem  - I was hoping someone else on
the list might have a better idea, but as a starter, you could try
modeling n|N and N separately. For n given N, (only for those cases
where N > 0 )you could try GEE. In Stata this would be -xtgee- with a
binomial family and logit link to coount for the dependence within
firms. The command would look something like

xtgee n x1 x2 x3 if N>0,fam(bin N) link(logit) robust i(firm)

For N you could try a zero-inflated Poisson or negative binomial with
random effects (-xtpoisson- or -xtnbreg-)

Al Feiveson

yes

On Tue, Dec 30, 2008 at 10:28 PM, Feiveson, Alan H. (JSC-SK311)
<Alan.H.Feiveson@nasa.gov> wrote:
> Sorry, I missed the part about N being zero also . So the model has to

> take into account that N can be zero. As an example,  n|N ~
> binomial(N,
> p) if N > 0, otherwise n=0; and N has a Poisson distribution. p and or

> the Poisson mean may vary randomly from firm to firm (random effects),

> but is fixed within a firm. May require a customized -ml- program, or
> maybe even a Bayesian approach. I am assuming that both N and n vary
> within the observations pertaining to the same firm. Is this correct?
>
> Al F.
>
> Andrea -
>
> Are n and N counts? What range of values do they take on? Can n be
> thought of as a binomial sample from N? Then try -glm- as before but
> using n as the dependent variable, not G or H.  Still, clustering on
> firms, should probably be taken into account. Perhaps this can be done

> using -xtlogit- on a 0-1 variable that sums to n.
>
> Al F.
>
> I typed:
> glm  depvar indepvar1 indepvar2... indepvarn, link(logit) robust nolog
>
> when I tried
>
> glm  depvar indepvar1 indepvar2... indepvarn,  link(logit)
> family(binomial) robust nolog
> I got
> "note: depvar has noninteger values"
>
> Actually, the great majority of my data are zeros: 204581 over 213000
> observation. What do you mean exactly with "model separately"?
>
> Austin:
> Sorry, I gave a wrong information: this is actually 1-H, that is why I

> have several 0s Let me say it more in detail:
> Dep var = Log(1+G)
> where G=1-H and  H=S(n/N)^2 (an Herfindahl index)
>
> I have 0 for all the cases in which H=1 and for all the cases in which

> n and N=0
>
>
> Al:
> no clustering variable, but indeed I have 10 observations for each
> firm (I use dummies for dealing with this) and indeed some indep
> variables have the same values for all the 10 obs and sometimes the
> depvar is the same for some of the 10 obs
>
> Thank you!
>
>
> On Tue, Dec 30, 2008 at 8:16 PM, Maarten buis
> <maartenbuis@yahoo.co.uk>
> wrote:
>> --- Andrea Rispoli <andrea.rspl@gmail.com> wrote:
>>> It is an Herfindahl index of concentration, it ranges from 0 to 1
>>> (in
>>> principle) : in my specific case:
>>>
>>> Variable |       Obs        Mean    Std. Dev.       Min        Max
>>>
>> -------------+-------------------------------------------------------
>> -------------+-
>>>  H         |    213620    .0190621    .0920916          0   .6477536
>>
>> How many zeros do you have? ( type in Stata: -count if H == float(0)-
>> ) Even though it is possible for a fractional logit to model a
>> dependent variable that includes zero (and one), if there are too
>> many
>
>> of these, then that might indicate that these zeros occur due to a
>> separate process and need to be modeled separately.
>>
>> -- Maarten
>>
>>
>>
>>
```