Re: st: test for significant change of a serier

 From n j cox <[email protected]> To [email protected] Subject Re: st: test for significant change of a serier Date Thu, 11 Jan 2007 14:58:18 +0000

The argument for logit is correct in principle, but over
the range from 0.14 to 0.15 logit of a proportion is as near
linear as is needed for almost all practical purposes.
In fact, forget the "almost". This really is a detail
compared with others.

If you are going to transform, note that Stata has a -logit()-
function. I prefer to do it by -glm-:

. glm index year

Iteration 0: log likelihood = 51.765713

Generalized linear models No. of obs = 10
Optimization : ML: Newton-Raphson Residual df = 8
Scale parameter = 2.33e-06
Deviance = .000018673 (1/df) Deviance = 2.33e-06
Pearson = .000018673 (1/df) Pearson = 2.33e-06

Variance function: V(u) = 1 [Gaussian]
Link function : g(u) = u [Identity]
Standard errors : OIM

Log likelihood = 51.76571288 AIC = -9.953143
BIC = -18.42066207

------------------------------------------------------------------------------
index | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
year | -.0007992 .0001682 -4.75 0.000 -.0011289 -.0004695
_cons | 1.741015 .3359864 5.18 0.000 1.082493 2.399536
------------------------------------------------------------------------------

. glm index year , link(logit)

Iteration 0: log likelihood = 51.745225
Iteration 1: log likelihood = 51.745602
Iteration 2: log likelihood = 51.745602

Generalized linear models No. of obs = 10
Optimization : ML: Newton-Raphson Residual df = 8
Scale parameter = 2.34e-06
Deviance = .0000187482 (1/df) Deviance = 2.34e-06
Pearson = .0000187482 (1/df) Pearson = 2.34e-06

Variance function: V(u) = 1 [Gaussian]
Link function : g(u) = ln(u/(1-u)) [Logit]
Standard errors : OIM

Log likelihood = 51.74560229 AIC = -9.94912
BIC = -18.420662

------------------------------------------------------------------------------
index | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
year | -.006451 .0013616 -4.74 0.000 -.0091197 -.0037822
_cons | 11.10835 2.719779 4.08 0.000 5.777684 16.43902
------------------------------------------------------------------------------

Yoking Herfindahl and Hirschman is not appropriate here
as their measures differ. Herfindahl's measure, or its
complement, is also known as the Gini index (one of several),
heterozygosity, Simpson's index, etc.

Nick
[email protected]

Clive Nicholas replied to Xiaoheng 'Kevin' Zhang

> I have a serier of index for 10 years. It is a Herfindahl index of
> concentration and I would like to test if the change of this index over
> time is significant.
> I am not sure how to translate this real problem into a statistics
> problem. Since it looks like a decreasing trend, I used linear regression
> of index on year and found the slope is statistically different from 0.
> But I am worrying about sample size......
>
> The indices are
> year index
> 1993 0.149552855
> 1994 0.146646187
> 1995 0.143958559
> 1996 0.145009261
> 1997 0.147389484
> 1998 0.145309026
> 1999 0.144218297
> 2000 0.142834716
> 2001 0.140957544
> 2002 0.140444707

There are two things about the Herfindahl-Hirschman index of market
concentration (to give it its full title), and its use as a response
variable in OLS that you need to be aware of:

(1) Since the index (H) is a fixed 0-1 scale, where 0 = perfect competition
and 1 = a monopoly, the use of -reg- is invalid under the Gauss-Markov
assumptions underpinning OLS;

and

(2) calculating the logit transformation of H gives you a new index (H*)
whose scale stretches from -infinity to +infinity. This makes it a much
more useful - and valid - index for OLS model fitting. Unlike H's scale,
H*'s scale is also _linear_.

Inputting your data and generating H*

. clear

. input year index

< snip >

and then looking at the relationship graphically via

. twoway line logindex year

shows that H* decreased by nearly -0.08 over 10 years, indicating that
competition within whatever market you're measuring _increased_. But was
that decrease in H* statistically significant over this period?

. reg logindex year, eform(OR)

Source | SS df MS Number of obs = 10
-------------+------------------------------ F( 1, 8) = 22.75
Model | .003446335 1 .003446335 Prob > F = 0.0014
Residual | .001211688 8 .000151461 R-squared = 0.7399
-------------+------------------------------ Adj R-squared = 0.7074
Total | .004658023 9 .000517558 Root MSE = .01231

----------------------------------------------------------------------------
logindex | OR Std. Err. t P>|t| [95% Conf. Interval]
-----------+----------------------------------------------------------------
year | .9935576 .0013462 -4.77 0.001 .990458 .9966668
----------------------------------------------------------------------------

Yes: H* significantly decreased by six-thousandths of 1 percent every year
in the period (notice the use of the -eform()- option to obtain this).
Whether this is important enough to care about is, of course, your call.
Although there doesn't appear to be any real improvement in model fit over
the standard OLS model I suspect you fitted (R^2 for I = .7056), you are
at least fitting a much more valid model. The model fit itself is pretty
impressive.

But then there's the pesky problem of your small N. The only way to
improve this is by having more data (you don't say where this data comes
from). Do you have it? Also, other variables need to be used if they're
available: e.g., if this is market data, then information on, say, whether
any new laws tightening or relaxing market competition would be very
useful to have.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/