Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: extract values from kdensity graphic

 From Nick Cox To statalist@hsphsun2.harvard.edu Subject Re: st: extract values from kdensity graphic Date Wed, 2 May 2012 10:35:27 +0100

```Another way of looking at these data is to apply -group1d- (SSC). In
fact Mike cannot do that himself because it needs Stata 9, but he can
use the results. With a least-squares criterion explained in the help
and references given, -group1d- yields as the best 5 groups

Group Size    First            Last           Mean      SD
5       8   23   100.62      30   100.91   100.75    0.09
4       1   22    98.41      22    98.41    98.41    0.00
3       6   16    97.19      21    97.39    97.29    0.06
2       8    8    96.11      15    96.34    96.25    0.07
1       7    1    94.74       7    95.08    94.95    0.11

In fact, just about any method of cluster analysis should find the
same groups if they are genuine, e.g. -cluster kmeans-. Then use
whatever summary you prefer.

. sort size

. group1d size, max(7)

Partitions of 30 data up to 7 groups

1 group:  sum of squares 143.60
Group Size    First            Last           Mean      SD
1      30    1    94.74      30   100.91    97.43    2.19

2 groups: sum of squares 23.00
Group Size    First            Last           Mean      SD
2       9   22    98.41      30   100.91   100.49    0.74
1      21    1    94.74      21    97.39    96.12    0.93

3 groups: sum of squares 6.62
Group Size    First            Last           Mean      SD
3       8   23   100.62      30   100.91   100.75    0.09
2      15    8    96.11      22    98.41    96.81    0.66
1       7    1    94.74       7    95.08    94.95    0.11

4 groups: sum of squares 1.26
Group Size    First            Last           Mean      SD
4       8   23   100.62      30   100.91   100.75    0.09
3       7   16    97.19      22    98.41    97.45    0.40
2       8    8    96.11      15    96.34    96.25    0.07
1       7    1    94.74       7    95.08    94.95    0.11

5 groups: sum of squares 0.20
Group Size    First            Last           Mean      SD
5       8   23   100.62      30   100.91   100.75    0.09
4       1   22    98.41      22    98.41    98.41    0.00
3       6   16    97.19      21    97.39    97.29    0.06
2       8    8    96.11      15    96.34    96.25    0.07
1       7    1    94.74       7    95.08    94.95    0.11

6 groups: sum of squares 0.14
Group Size    First            Last           Mean      SD
6       8   23   100.62      30   100.91   100.75    0.09
5       1   22    98.41      22    98.41    98.41    0.00
4       6   16    97.19      21    97.39    97.29    0.06
3       8    8    96.11      15    96.34    96.25    0.07
2       5    3    94.95       7    95.08    95.01    0.05
1       2    1    94.74       2    94.89    94.81    0.08

7 groups: sum of squares 0.10
Group Size    First            Last           Mean      SD
7       2   29   100.84      30   100.91   100.88    0.04
6       6   23   100.62      28   100.76   100.71    0.05
5       1   22    98.41      22    98.41    98.41    0.00
4       6   16    97.19      21    97.39    97.29    0.06
3       8    8    96.11      15    96.34    96.25    0.07
2       5    3    94.95       7    95.08    95.01    0.05
1       2    1    94.74       2    94.89    94.81    0.08

Groups     Sums of squares
1          143.60
2           23.00
3            6.62
4            1.26
5            0.20
6            0.14
7            0.10

On Wed, May 2, 2012 at 9:34 AM, Nick Cox <njcoxstata@gmail.com> wrote:
> In practice,
>
> gen sizer = round(size)
>
>
> scatter sizer size
>
> Nick
>
> On Wed, May 2, 2012 at 9:16 AM,  <mcross@exemail.com.au> wrote:
>> * Hi Statalist,
>> * I'm a beginner using version 8.
>> * The following measurements were collected by a machine in my lab...
>> clear
>> input sampling_event size
>> 1 94.74
>> 2 94.89
>> 3 94.95
>> 4 94.97
>> 5 95
>> 6 95.05
>> 7 95.08
>> 8 96.11
>> 9 96.22
>> 10 96.24
>> 11 96.27
>> 12 96.27
>> 13 96.27
>> 14 96.32
>> 15 96.34
>> 16 97.19
>> 17 97.26
>> 18 97.26
>> 19 97.32
>> 20 97.34
>> 21 97.39
>> 22 98.41
>> 23 100.62
>> 24 100.69
>> 25 100.69
>> 26 100.76
>> 27 100.76
>> 28 100.76
>> 29 100.84
>> 30 100.91
>> end
>> list
>> twoway (scatter size sampling_event)
>>
>> * My aim is to class these size values into categories (5 categories in
>> the example shown).
>> * kdensity will generate the following graphic...
>>
>> kdensity size , w(0.1) n(30)
>>
>> * The troughs of this graphic are a good way to define the bounds of each
>> category.
>> * Category_4, for example would include all size values larger than 98 and
>> less than 99.
>> * I'd like to extract these trough points as a kdensity post-estimation
>> and output them as a new variable.
>> * Is this possible?
>>
>> * Look forward to any advice the list has to offer.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```