Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

Re: st: Homogeneity of ordinal Variabel

 From Nick Cox To "statalist@hsphsun2.harvard.edu" Subject Re: st: Homogeneity of ordinal Variabel Date Tue, 21 May 2013 17:56:01 +0100

```You can calculate any number of measures of heterogeneity here. The
same measures crop up again and again in economics, sociology,
ecology, etc., etc. under headings such as concentration, inequality,
diversity, etc., etc.

Two of the simplest are the
Gini-Turing-Hirschman-Simpson-Herfindahl-Good measure based on sum of
squared proportions p^2 and the Shannon-Wiener measure based on sum of
p ln p. People are welcome to insert other authors' names according to
taste and historical knowledge. Different formulas are to be
considered equivalent if a one-to-one correspondence can be identified
between results.

The idea that mean and SD are out of order here possibly stems from
exposure to some version of the Stevens doctrine that measurement
scale determines legitimate statistical properties. Well, yes and no.
In practice I predict that any ordering shown by SD will be matched
roughly by one shown by the Gini or entropy measures. I like the
versions of both of those that are "numbers equivalents", i.e. they
are recast to have an interpretation on the same scale as the number
of categories.

Here are some sample calculations

. sysuse auto , clear
(1978 Automobile Data)

. tab rep78, matcell(freq)

Repair |
Record 1978 |      Freq.     Percent        Cum.
------------+-----------------------------------
1 |          2        2.90        2.90
2 |          8       11.59       14.49
3 |         30       43.48       57.97
4 |         18       26.09       84.06
5 |         11       15.94      100.00
------------+-----------------------------------
Total |         69      100.00

. mata
------------------------------------------------- mata (type end to
exit) ---------------
: freq = st_matrix("freq")

: freq
1
+------+
1 |   2  |
2 |   8  |
3 |  30  |
4 |  18  |
5 |  11  |
+------+

: p = freq / sum(freq)

: sum(p:^2)
.2967863894

: -sum(p :* ln(p))
1.357855957

: 1/sum(p:^2)
3.369426752

: exp(-sum(p :* ln(p)))
3.887848644

So -rep78- has heterogeneity 3.37 and 3.89 on these measures. (If
every car had the same repair record, both measures would return 1. A
distribution 0.2 0.2 0.2 0.2 0.2 would return 5.)

There is an enormous literature. Here is one of many entry points:

http://exploringdatablog.blogspot.co.uk/2011/04/interestingness-measures.html

Nick
njcoxstata@gmail.com

On 21 May 2013 17:29, Meulemann  Max <mmeulemann@ethz.ch> wrote:
> Hi,
>
> I am interested in showing that the respondent´s assessments on one item of my set are more heterogeneous than for the others.
>
> Im using stata 12
>
> I have 6 items describing how important respondents found certain issues to be on a scale of 1 "not important" to 4 "very important".
> Looking at the data and the frequency table, I have the feeling that the agreement on one item is much less than on the other.
> Else I would say there is more divergence in the answers, which is roughly shown by the summary tables, although I should not really
> look at means and standard deviations of ordinal variables.
>
>
>     Variable |       Obs        Mean    Std. Dev.       Min        Max
> -------------+--------------------------------------------------------
>        c0101 |       429     3.69697    .5690746          1          4
>        c0102 |       425    3.207059    .8872509          1          4
>        c0103 |       428    3.429907    .7385301          1          4
>        c0104 |       411    2.474453    1.010291          1          4
>        c0105 |       430    3.430233    .6885798          1          4
>        c0106 |       430    3.590698     .665435          1          4
>
> I would believe that c0104 is more controversial issues than c0101.
> I yet have not found a way to express my above given statement in a meaningful statistical way. Is there a way to test my statement?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```