Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Homogeneity of ordinal Variabel


From   Nick Cox <njcoxstata@gmail.com>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: Homogeneity of ordinal Variabel
Date   Tue, 21 May 2013 17:56:01 +0100

You can calculate any number of measures of heterogeneity here. The
same measures crop up again and again in economics, sociology,
ecology, etc., etc. under headings such as concentration, inequality,
diversity, etc., etc.

Two of the simplest are the
Gini-Turing-Hirschman-Simpson-Herfindahl-Good measure based on sum of
squared proportions p^2 and the Shannon-Wiener measure based on sum of
p ln p. People are welcome to insert other authors' names according to
taste and historical knowledge. Different formulas are to be
considered equivalent if a one-to-one correspondence can be identified
between results.

The idea that mean and SD are out of order here possibly stems from
exposure to some version of the Stevens doctrine that measurement
scale determines legitimate statistical properties. Well, yes and no.
In practice I predict that any ordering shown by SD will be matched
roughly by one shown by the Gini or entropy measures. I like the
versions of both of those that are "numbers equivalents", i.e. they
are recast to have an interpretation on the same scale as the number
of categories.

Here are some sample calculations

. sysuse auto , clear
(1978 Automobile Data)

. tab rep78, matcell(freq)

     Repair |
Record 1978 |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |          2        2.90        2.90
          2 |          8       11.59       14.49
          3 |         30       43.48       57.97
          4 |         18       26.09       84.06
          5 |         11       15.94      100.00
------------+-----------------------------------
      Total |         69      100.00

. mata
------------------------------------------------- mata (type end to
exit) ---------------
: freq = st_matrix("freq")

: freq
        1
    +------+
  1 |   2  |
  2 |   8  |
  3 |  30  |
  4 |  18  |
  5 |  11  |
    +------+

: p = freq / sum(freq)

: sum(p:^2)
  .2967863894

: -sum(p :* ln(p))
  1.357855957

: 1/sum(p:^2)
  3.369426752

: exp(-sum(p :* ln(p)))
  3.887848644

So -rep78- has heterogeneity 3.37 and 3.89 on these measures. (If
every car had the same repair record, both measures would return 1. A
distribution 0.2 0.2 0.2 0.2 0.2 would return 5.)

There is an enormous literature. Here is one of many entry points:

http://exploringdatablog.blogspot.co.uk/2011/04/interestingness-measures.html


Nick
njcoxstata@gmail.com


On 21 May 2013 17:29, Meulemann  Max <mmeulemann@ethz.ch> wrote:
> Hi,
>
> I am interested in showing that the respondent´s assessments on one item of my set are more heterogeneous than for the others.
>
> Im using stata 12
>
> I have 6 items describing how important respondents found certain issues to be on a scale of 1 "not important" to 4 "very important".
> Looking at the data and the frequency table, I have the feeling that the agreement on one item is much less than on the other.
> Else I would say there is more divergence in the answers, which is roughly shown by the summary tables, although I should not really
> look at means and standard deviations of ordinal variables.
>
>
>     Variable |       Obs        Mean    Std. Dev.       Min        Max
> -------------+--------------------------------------------------------
>        c0101 |       429     3.69697    .5690746          1          4
>        c0102 |       425    3.207059    .8872509          1          4
>        c0103 |       428    3.429907    .7385301          1          4
>        c0104 |       411    2.474453    1.010291          1          4
>        c0105 |       430    3.430233    .6885798          1          4
>        c0106 |       430    3.590698     .665435          1          4
>
> I would believe that c0104 is more controversial issues than c0101.
> I yet have not found a way to express my above given statement in a meaningful statistical way. Is there a way to test my statement?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index