The argument for logit is correct in principle, but over
the range from 0.14 to 0.15 logit of a proportion is as near
linear as is needed for almost all practical purposes.
In fact, forget the "almost". This really is a detail
compared with others.
If you are going to transform, note that Stata has a -logit()-
function. I prefer to do it by -glm-:
. glm index year
Iteration 0: log likelihood = 51.765713
Generalized linear models No. of obs =
10
Optimization : ML: Newton-Raphson Residual df =
8
Scale parameter =
2.33e-06
Deviance = .000018673 (1/df) Deviance =
2.33e-06
Pearson = .000018673 (1/df) Pearson =
2.33e-06
Variance function: V(u) = 1 [Gaussian]
Link function : g(u) = u [Identity]
Standard errors : OIM
Log likelihood = 51.76571288 AIC =
-9.953143
BIC = -18.42066207
------------------------------------------------------------------------------
index | Coef. Std. Err. z P>|z| [95% Conf.
Interval]
-------------+----------------------------------------------------------------
year | -.0007992 .0001682 -4.75 0.000 -.0011289
-.0004695
_cons | 1.741015 .3359864 5.18 0.000 1.082493
2.399536
------------------------------------------------------------------------------
. glm index year , link(logit)
Iteration 0: log likelihood = 51.745225
Iteration 1: log likelihood = 51.745602
Iteration 2: log likelihood = 51.745602
Generalized linear models No. of obs =
10
Optimization : ML: Newton-Raphson Residual df =
8
Scale parameter =
2.34e-06
Deviance = .0000187482 (1/df) Deviance =
2.34e-06
Pearson = .0000187482 (1/df) Pearson =
2.34e-06
Variance function: V(u) = 1 [Gaussian]
Link function : g(u) = ln(u/(1-u)) [Logit]
Standard errors : OIM
Log likelihood = 51.74560229 AIC =
-9.94912
BIC = -18.420662
------------------------------------------------------------------------------
index | Coef. Std. Err. z P>|z| [95% Conf.
Interval]
-------------+----------------------------------------------------------------
year | -.006451 .0013616 -4.74 0.000 -.0091197
-.0037822
_cons | 11.10835 2.719779 4.08 0.000 5.777684
16.43902
------------------------------------------------------------------------------
Yoking Herfindahl and Hirschman is not appropriate here
as their measures differ. Herfindahl's measure, or its
complement, is also known as the Gini index (one of several),
heterozygosity, Simpson's index, etc.
Nick
n.j.cox@durham.ac.uk
Clive Nicholas replied to Xiaoheng 'Kevin' Zhang
I have a serier of index for 10 years. It is a Herfindahl index of
concentration and I would like to test if the change of this index over
time is significant.
I am not sure how to translate this real problem into a statistics
problem. Since it looks like a decreasing trend, I used linear regression
of index on year and found the slope is statistically different from 0.
But I am worrying about sample size......
The indices are
year index
1993 0.149552855
1994 0.146646187
1995 0.143958559
1996 0.145009261
1997 0.147389484
1998 0.145309026
1999 0.144218297
2000 0.142834716
2001 0.140957544
2002 0.140444707
There are two things about the Herfindahl-Hirschman index of market
concentration (to give it its full title), and its use as a response
variable in OLS that you need to be aware of:
(1) Since the index (H) is a fixed 0-1 scale, where 0 = perfect competition
and 1 = a monopoly, the use of -reg- is invalid under the Gauss-Markov
assumptions underpinning OLS;
and
(2) calculating the logit transformation of H gives you a new index (H*)
whose scale stretches from -infinity to +infinity. This makes it a much
more useful - and valid - index for OLS model fitting. Unlike H's
scale,
H*'s scale is also _linear_.
Inputting your data and generating H*
. clear
. input year index
< snip >
. g logindex=ln(index/(1-index))
and then looking at the relationship graphically via
. twoway line logindex year
shows that H* decreased by nearly -0.08 over 10 years, indicating that
competition within whatever market you're measuring _increased_. But was
that decrease in H* statistically significant over this period?
. reg logindex year, eform(OR)
Source | SS df MS Number of obs =
10
-------------+------------------------------ F( 1, 8) = 22.75
Model | .003446335 1 .003446335 Prob > F =
0.0014
Residual | .001211688 8 .000151461 R-squared =
0.7399
-------------+------------------------------ Adj R-squared = 0.7074
Total | .004658023 9 .000517558 Root MSE =
.01231
----------------------------------------------------------------------------
logindex | OR Std. Err. t P>|t| [95% Conf.
Interval]
-----------+----------------------------------------------------------------
year | .9935576 .0013462 -4.77 0.001 .990458
.9966668
----------------------------------------------------------------------------
Yes: H* significantly decreased by six-thousandths of 1 percent every year
in the period (notice the use of the -eform()- option to obtain this).
Whether this is important enough to care about is, of course, your call.
Although there doesn't appear to be any real improvement in model fit over
the standard OLS model I suspect you fitted (R^2 for I = .7056), you are
at least fitting a much more valid model. The model fit itself is pretty
impressive.
But then there's the pesky problem of your small N. The only way to
improve this is by having more data (you don't say where this data comes
from). Do you have it? Also, other variables need to be used if they're
available: e.g., if this is market data, then information on, say, whether
any new laws tightening or relaxing market competition would be very
useful to have.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/