Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: extract values from kdensity graphic


From   mcross@exemail.com.au
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: extract values from kdensity graphic
Date   Thu, 3 May 2012 02:27:43 +1000

Many thanks Nick,

-group1d- doesn't suit my application (versions of Stata aside) as I don't
want to have to specify the number of groups. I really like the kdensity
plot because it automatically determines the number of groups (which are
in the hundreds for my real data sets).

Unfortunately -round- often fails to group sizes appropriately in my full
data sets too, as the clusters don't always align with the rounding units.

The kdensity plot shows exactly what I want, but alas I can't extract it's
data (trough coordinates).

Any more thoughts from the list?

Mike.




Another way of looking at these data is to apply -group1d- (SSC). In fact
Mike cannot do that himself because it needs Stata 9, but he can use the
results. With a least-squares criterion explained in the help and
references given, -group1d- yields as the best 5 groups

Group Size    First            Last           Mean      SD
  5       8   23   100.62      30   100.91   100.75    0.09
  4       1   22    98.41      22    98.41    98.41    0.00
  3       6   16    97.19      21    97.39    97.29    0.06
  2       8    8    96.11      15    96.34    96.25    0.07
  1       7    1    94.74       7    95.08    94.95    0.11

In fact, just about any method of cluster analysis should find the same
groups if they are genuine, e.g. -cluster kmeans-. Then use whatever
summary you prefer.

Details follow for -group1d-.

. sort size

. group1d size, max(7)

  Partitions of 30 data up to 7 groups

  1 group:  sum of squares 143.60
  Group Size    First            Last           Mean      SD
  1      30    1    94.74      30   100.91    97.43    2.19

  2 groups: sum of squares 23.00
  Group Size    First            Last           Mean      SD
  2       9   22    98.41      30   100.91   100.49    0.74
  1      21    1    94.74      21    97.39    96.12    0.93

  3 groups: sum of squares 6.62
  Group Size    First            Last           Mean      SD
  3       8   23   100.62      30   100.91   100.75    0.09
  2      15    8    96.11      22    98.41    96.81    0.66
  1       7    1    94.74       7    95.08    94.95    0.11

  4 groups: sum of squares 1.26
  Group Size    First            Last           Mean      SD
  4       8   23   100.62      30   100.91   100.75    0.09
  3       7   16    97.19      22    98.41    97.45    0.40
  2       8    8    96.11      15    96.34    96.25    0.07
  1       7    1    94.74       7    95.08    94.95    0.11

  5 groups: sum of squares 0.20
  Group Size    First            Last           Mean      SD
  5       8   23   100.62      30   100.91   100.75    0.09
  4       1   22    98.41      22    98.41    98.41    0.00
  3       6   16    97.19      21    97.39    97.29    0.06
  2       8    8    96.11      15    96.34    96.25    0.07
  1       7    1    94.74       7    95.08    94.95    0.11

  6 groups: sum of squares 0.14
  Group Size    First            Last           Mean      SD
  6       8   23   100.62      30   100.91   100.75    0.09
  5       1   22    98.41      22    98.41    98.41    0.00
  4       6   16    97.19      21    97.39    97.29    0.06
  3       8    8    96.11      15    96.34    96.25    0.07
  2       5    3    94.95       7    95.08    95.01    0.05
  1       2    1    94.74       2    94.89    94.81    0.08

  7 groups: sum of squares 0.10
  Group Size    First            Last           Mean      SD
  7       2   29   100.84      30   100.91   100.88    0.04
  6       6   23   100.62      28   100.76   100.71    0.05
  5       1   22    98.41      22    98.41    98.41    0.00
  4       6   16    97.19      21    97.39    97.29    0.06
  3       8    8    96.11      15    96.34    96.25    0.07
  2       5    3    94.95       7    95.08    95.01    0.05
  1       2    1    94.74       2    94.89    94.81    0.08

  Groups     Sums of squares
    1          143.60
    2           23.00
    3            6.62
    4            1.26
    5            0.20
    6            0.14
    7            0.10


On Wed, May 2, 2012 at 9:34 AM, Nick Cox <njcoxstata@gmail.com> wrote:
In practice,

gen sizer = round(size)

is a simpler way of degrading your data. Check by

scatter sizer size

Nick

On Wed, May 2, 2012 at 9:16 AM,  <mcross@exemail.com.au> wrote:
* Hi Statalist,
* I'm a beginner using version 8.
* The following measurements were collected by a machine in my lab...
clear
input sampling_event size
1 94.74
2 94.89
3 94.95
4 94.97
5 95
6 95.05
7 95.08
8 96.11
9 96.22
10 96.24
11 96.27
12 96.27
13 96.27
14 96.32
15 96.34
16 97.19
17 97.26
18 97.26
19 97.32
20 97.34
21 97.39
22 98.41
23 100.62
24 100.69
25 100.69
26 100.76
27 100.76
28 100.76
29 100.84
30 100.91
end
list
twoway (scatter size sampling_event)

* My aim is to class these size values into categories (5 categories in
* the example shown).
* kdensity will generate the following graphic...

kdensity size , w(0.1) n(30)

* The troughs of this graphic are a good way to define the bounds of
* each category.
* Category_4, for example would include all size values larger than 98
* and less than 99.
* I'd like to extract these trough points as a kdensity post-estimation
* and output them as a new variable.
* Is this possible?
* Look forward to any advice the list has to offer.


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index