Re: st: Segmenting a dataset

 From David Kantor To statalist@hsphsun2.harvard.edu Subject Re: st: Segmenting a dataset Date Thu, 17 May 2007 16:10:14 -0400

```At 03:46 PM 5/17/2007, Morrison Hodges wrote:
```
```I have a dataset of 10 variables and 5000 observations. I need to calculate
the median of each variable in groups of 30 observations, i.e., the median
of each variable in observations 1-30, then the median for 31-60, then
61-90, etc. I know I can get the median from the p50 value of -summarize-,
but I'm not sure how to obtain consecutive segments of 30 observations each
to perform -summarize- on. Can anyone help?
Thanks, Morry Hodges
```
Do you want to just see what the medians are? If so, just do..
summarize var1 var2 ... in 1/30, det
summarize var1 var2 ... in 31/60, det
etc.

You can do this in a loop, if you prefer:
forvalues j = 1(30) `=_N' {
summarize var1 var2 ... in `j' / `=min( `j'+30, _N), det
}

----

On the other hand, do you want the values deposited in the dataset? If so then, first get a "group" variable.
gen int group = floor(_n / 30)

Now if you want the values deposited into the data as constants by group...
bysort group: egen med1 = median(var1)
and so on for the other variables.

If you want just a set of collapsed values...
collapse (median) med1 = var1 (median) med2 = var2 ... , by(group)

I hope this helps.
--David

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/