Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: extract values from kdensity graphic

 From Nick Cox To statalist@hsphsun2.harvard.edu Subject Re: st: extract values from kdensity graphic Date Thu, 3 May 2012 17:42:17 +0100

```-x- is by construction equally spaced and in any case not the original data.

I suggest that a fairer graph is

graph twoway (connected d size if group == 1) ///
(connected d size if group == 2) ///
(connected d size if group == 3) ///
(connected d size if group == 4) ///
(connected d size if group == 5)

which shows that your method based on gaps agrees well with the kernel
density default -- in this example.

Nick

On Thu, May 3, 2012 at 5:24 PM, Seed, Paul <paul.seed@kcl.ac.uk> wrote:
> Dear Statalist,
>
> As Nick points out, this is becoming quite a complex problem.
> I actually would not use -kdensity-, as it does
> not capture the essential features of Mike's original data set.
>
> A simpler approach is to look at the differences between successive values,
> and declare a new group whenever the gap is large (for a suitable value
> of "large").  This can be quite easily done in version 8.
>
>
> ***** Begin example **********
>
> * Enter Mike's data set
> set more off
> clear
> input sampling_event size
> 1 94.74
> 2 94.89
> 3 94.95
> 4 94.97
> 5 95
> 6 95.05
> 7 95.08
> 8 96.11
> 9 96.22
> 10 96.24
> 11 96.27
> 12 96.27
> 13 96.27
> 14 96.32
> 15 96.34
> 16 97.19
> 17 97.26
> 18 97.26
> 19 97.32
> 20 97.34
> 21 97.39
> 22 98.41
> 23 100.62
> 24 100.69
> 25 100.69
> 26 100.76
> 27 100.76
> 28 100.76
> 29 100.84
> 30 100.91
> end
> list
> twoway (scatter size sampling_event)
>
> * Indentify groups
> sort size
> gen step = size -size[_n-1]
>
> * Use -stem- to quickly assess the step sizes
> stem step
> * In the example, steps are all <=0.1 or >= 0.85
> * I declare a new group for any step > 0.5
> * I could change this depending on the data set
>
> gen group = step >0.5
> replace group = sum(group)
>
> * Check groups are well defined
> bys group : su size
>
> * Graph the various groups in different colours
> graph twoway (connected size sampling_event if group == 1) ///
>        (connected size sampling_event if group == 2) ///
>        (connected size sampling_event if group == 3) ///
>        (connected size sampling_event if group == 4) ///
>        (connected size sampling_event if group == 5)
> * That looks good
>
> * Now try out -kdensity-; pick up the plotted values in x and d
> kdensity size , w(0.1) n(30) gen(x d)
>
> graph twoway (connected d x if group == 1) ///
>        (connected d x if group == 2) ///
>        (connected d x if group == 3) ///
>        (connected d x if group == 4) ///
>        (connected d x if group == 5)
> * kdensity just does not seem to capture the groups I see in the simple scatter plot.
>
>
> ********** End example **************
>
> Paul T Seed, Senior Lecturer in Medical Statistics,
>
> Division of Women's Health, King's College London
> Women's Health Academic Centre KHP
> 020 7188 3642,
>  paul.seed@kcl.ac.uk,
>
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```