Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: R: RE: R: Date: Fri, 23 Sep 2011 16:11:35 -0000

From   Nick Cox <>
Subject   Re: st: R: RE: R: Date: Fri, 23 Sep 2011 16:11:35 -0000
Date   Mon, 26 Sep 2011 11:05:34 +0100

I don't understand this exchange.

Georges has what is still an extremely vague question. In the absence
of any detail about his data -- what the variables are, what typical
observations look like, etc. -- just about the only recommendation
that will cover his problem is to read the Data management manual
until the answer becomes clear. Georges: You really need to tell us

Carlo appears to be trying to guess what Georges' data might look
like. But even in the situations he imagines it is definitely not
necessary to use a command for each group specifying -in <range>-
after a look at the data.

For example, in the data of Carlo's first posting

split pts_code, parse(_)

which separate out hospital and patient identifiers.

Carlo's trying very hard to help, but the real difficulty is that
Georges' precise problem remains clear.


On Mon, Sep 26, 2011 at 10:51 AM, Carlo Lazzaro <> wrote:

> in your example, the solution I proposed is surely very time consuming.
> However, I assume that patients in your dataset are ordered in progressive
> order of imputation for each hospital participating in your research. I mean
> something like Hosp_1_1; Hosp_1_2;...;Hosp_2_1; Hosp_2_2 and so forth, where
> the first number refers to hospital and the second one identifies the
> patient. Let's call this variable pts_code.
> You can type - sort pts-code - and hospital should appear in alphabetical
> order (with patients progressively identifies via the second number of
> pts_code.
> Then you can create another variable (let's call it hsp_cluster) in this
> way:
> g hsp_cluster= 1 in 1/84 //** assuming you have the same number of patients
> enrolled in each one of the 36 hospitals you mention
> replace hsp_cluster=2 in 85/169
> and so forth, until you reach the 36th cluster.
> I would agree that this solution too is time consuming. Unfortunately, this
> is the price we have to pay whenever we omit the centre_code variable in
> data input.


> In my situation there are almost 3000 patients for 36 hospitals (36
> clusters?). It will be time consuming to apply this solution, will not it?

Carlo Lazzaro

> you are right. The example I sketched considers 10 patients (5 from an
> hypothetical hospital in Rome and 5 from an hypothetical hospital in Milan)
> and 2 clusters, ie the two hospitals.


> I need some clarification about your solution. Should I understand that in
> your example there are 10 observations about patients and the number of
> clusters is 2.

Carlo Lazzaro []

> the following code might do what you are asking for:
> ----------------------------------------------------
> set obs 10
> g id=_n
> g pts_code="Rome_1" in 1
> replace pts_code="Rome_2" in 2
> replace pts_code="Rome_3" in 3
> replace pts_code="Rome_4" in 4
> replace pts_code="Rome_5" in 5
> replace pts_code="Milan_1" in 6
> replace pts_code="Milan_2" in 7
> replace pts_code="Milan_3" in 8
> replace pts_code="Milan_4" in 9
> replace pts_code="Milan_5" in 10
> g cluster=1 in 1/5
> replace cluster=2 in 6/10


> I need your help on the following issue:
> I want to know how to transform a simple database (example: patients
> database) to cluster database (example: Hospital database).
*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index