# Re: st: programming question: obtaining statistics from clustered data

 From Ulrich Kohler To statalist@hsphsun2.harvard.edu Subject Re: st: programming question: obtaining statistics from clustered data Date Wed, 26 Jun 2002 09:42:19 +0000

```Javier Escobal  wrote
> I have a data base that has the following form:
>
> id    cluster    X
> 1        1        0.5
> 2        1        0.7
> 3        1        0.4
> ..        .         .
> ..        .         .
> ..        .         .
> 100      3       0.6
> 101      3       0.6
> 102      3       0.8
> 103      3       0.2
>
> that is observations can be grouped in clusters (of different size). I
> am interested in constructing different statistics: for example for each
> observation "i" I need to capture the average and standard deviation of
> all observations that belong to the same cluster where "i" belongs
> excluding observation "i".

For the mean:

.. sort cluster
.. by cluster: gen sumx = sum(X)
.. by cluster: replace sumx = sumx[_N] - X
.. by cluster: gen meanx = sumx/(_N-1)

For the standard deviation the answer seems to be more difficult. At the
moment I only can think about a solution with a loop over the observations
within each cluster. There must be a better solution and I am sure that I
have overlooked somethink obvious. But anyway, you may use the following as a
starting point:

gen temp = .
gen std = .
egen group = group(cluster)  /* this might be not necassary */
sort group
local K = group[_N]
local last 0
forvalues k = 1/`K' {
local first = 1 + `last'
count if group == `k'
local N = r(N)
local last = `first' + (`N'-1)
forvalues i = `first'/`last' {
replace temp = .
replace temp = (X - meanx[`i'])^2 if _n~= `i' & group == `k'
replace temp = sum(temp)
replace std = temp[_N]/(`N'-2) if _n== `i'
}
}
drop temp group

regards
uli

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```