Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
mcross@exemail.com.au |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: extract values from kdensity graphic |

Date |
Thu, 3 May 2012 02:27:43 +1000 |

Many thanks Nick, -group1d- doesn't suit my application (versions of Stata aside) as I don't want to have to specify the number of groups. I really like the kdensity plot because it automatically determines the number of groups (which are in the hundreds for my real data sets). Unfortunately -round- often fails to group sizes appropriately in my full data sets too, as the clusters don't always align with the rounding units. The kdensity plot shows exactly what I want, but alas I can't extract it's data (trough coordinates). Any more thoughts from the list? Mike. Another way of looking at these data is to apply -group1d- (SSC). In fact Mike cannot do that himself because it needs Stata 9, but he can use the results. With a least-squares criterion explained in the help and references given, -group1d- yields as the best 5 groups Group Size First Last Mean SD 5 8 23 100.62 30 100.91 100.75 0.09 4 1 22 98.41 22 98.41 98.41 0.00 3 6 16 97.19 21 97.39 97.29 0.06 2 8 8 96.11 15 96.34 96.25 0.07 1 7 1 94.74 7 95.08 94.95 0.11 In fact, just about any method of cluster analysis should find the same groups if they are genuine, e.g. -cluster kmeans-. Then use whatever summary you prefer. Details follow for -group1d-. . sort size . group1d size, max(7) Partitions of 30 data up to 7 groups 1 group: sum of squares 143.60 Group Size First Last Mean SD 1 30 1 94.74 30 100.91 97.43 2.19 2 groups: sum of squares 23.00 Group Size First Last Mean SD 2 9 22 98.41 30 100.91 100.49 0.74 1 21 1 94.74 21 97.39 96.12 0.93 3 groups: sum of squares 6.62 Group Size First Last Mean SD 3 8 23 100.62 30 100.91 100.75 0.09 2 15 8 96.11 22 98.41 96.81 0.66 1 7 1 94.74 7 95.08 94.95 0.11 4 groups: sum of squares 1.26 Group Size First Last Mean SD 4 8 23 100.62 30 100.91 100.75 0.09 3 7 16 97.19 22 98.41 97.45 0.40 2 8 8 96.11 15 96.34 96.25 0.07 1 7 1 94.74 7 95.08 94.95 0.11 5 groups: sum of squares 0.20 Group Size First Last Mean SD 5 8 23 100.62 30 100.91 100.75 0.09 4 1 22 98.41 22 98.41 98.41 0.00 3 6 16 97.19 21 97.39 97.29 0.06 2 8 8 96.11 15 96.34 96.25 0.07 1 7 1 94.74 7 95.08 94.95 0.11 6 groups: sum of squares 0.14 Group Size First Last Mean SD 6 8 23 100.62 30 100.91 100.75 0.09 5 1 22 98.41 22 98.41 98.41 0.00 4 6 16 97.19 21 97.39 97.29 0.06 3 8 8 96.11 15 96.34 96.25 0.07 2 5 3 94.95 7 95.08 95.01 0.05 1 2 1 94.74 2 94.89 94.81 0.08 7 groups: sum of squares 0.10 Group Size First Last Mean SD 7 2 29 100.84 30 100.91 100.88 0.04 6 6 23 100.62 28 100.76 100.71 0.05 5 1 22 98.41 22 98.41 98.41 0.00 4 6 16 97.19 21 97.39 97.29 0.06 3 8 8 96.11 15 96.34 96.25 0.07 2 5 3 94.95 7 95.08 95.01 0.05 1 2 1 94.74 2 94.89 94.81 0.08 Groups Sums of squares 1 143.60 2 23.00 3 6.62 4 1.26 5 0.20 6 0.14 7 0.10 On Wed, May 2, 2012 at 9:34 AM, Nick Cox <njcoxstata@gmail.com> wrote: In practice, gen sizer = round(size) is a simpler way of degrading your data. Check by scatter sizer size Nick On Wed, May 2, 2012 at 9:16 AM, <mcross@exemail.com.au> wrote: * Hi Statalist, * I'm a beginner using version 8. * The following measurements were collected by a machine in my lab... clear input sampling_event size 1 94.74 2 94.89 3 94.95 4 94.97 5 95 6 95.05 7 95.08 8 96.11 9 96.22 10 96.24 11 96.27 12 96.27 13 96.27 14 96.32 15 96.34 16 97.19 17 97.26 18 97.26 19 97.32 20 97.34 21 97.39 22 98.41 23 100.62 24 100.69 25 100.69 26 100.76 27 100.76 28 100.76 29 100.84 30 100.91 end list twoway (scatter size sampling_event) * My aim is to class these size values into categories (5 categories in * the example shown). * kdensity will generate the following graphic... kdensity size , w(0.1) n(30) * The troughs of this graphic are a good way to define the bounds of * each category. * Category_4, for example would include all size values larger than 98 * and less than 99. * I'd like to extract these trough points as a kdensity post-estimation * and output them as a new variable. * Is this possible? * Look forward to any advice the list has to offer. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: extract values from kdensity graphic***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: extract values from kdensity graphic***From:*brendan.halpin@ul.ie (Brendan Halpin)

**References**:**st: extract values from kdensity graphic***From:*mcross@exemail.com.au

**Re: st: extract values from kdensity graphic***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: extract values from kdensity graphic***From:*Nick Cox <njcoxstata@gmail.com>

- Prev by Date:
**RE: st: Regression with multiple age groups** - Next by Date:
**Re: st: extract values from kdensity graphic** - Previous by thread:
**Re: st: extract values from kdensity graphic** - Next by thread:
**Re: st: extract values from kdensity graphic** - Index(es):