Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

AW: st: simple sum() question


From   "Martin Weiss" <[email protected]>
To   <[email protected]>
Subject   AW: st: simple sum() question
Date   Fri, 17 Apr 2009 16:23:28 +0200

<> 

Shezad might also like to note that -egen, sum()- was abandoned for the
Stata 9 release (-help whatsnew8to9-), but a look at its code shows that it
calls -egen, total()- on the user`s behalf.

*************
clear*

input patient_id    timepoint    clinic
1                 1           .
1                 2           .
1                 3           .
2                 1           2
2                 2           0
3                 1           1
3                 2           .
end

bysort patient_id: egen sum_clinic = sum(clinic)
bysort patient_id: egen total_clinic = total(clinic)

compare sum_clinic total_clinic
*************

The -sum()- function lives on when called by -gen- as in my code earlier. It
produces a runnning sum, though, in contrast to the (constant) total
returned by -egen, total()-...


HTH
Martin


-----Ursprüngliche Nachricht-----
Von: [email protected]
[mailto:[email protected]] Im Auftrag von Ulrich Kohler
Gesendet: Freitag, 17. April 2009 15:43
An: [email protected]
Betreff: Re: st: simple sum() question

Shehzad Ali wrote
> Here is a quick summary of what I am doing. Each patient (varname: 
> patient_id) was observed at 4 time points and at each time point we asked 
> about the clinic visits (varname: clinic) in the last 3 months. The
dataset 
> is in long form (shown below):
> 
> patient_id    timepoint    clinic 
> 1                 1           0
> 1                 2           1
> 1                 3           .
> 2                 1           2
> 2                 2           0
> 3                 1           1
> 3                 2           .
> 
> The line below generates a sum of all clinic visits for each patient:
> 
> bysort patient_id: egen sum_clinic = sum(clinic)
> 
> Now if at one time point, clinic visit is missing (as its seen for
patients 
> 1 and 3), then I want stata to return missing value for the sum. The above

> command returns the total of the non-missing observations, ignoring the 
> missing ones (understandably). But if I tried:
> 
> bysort patient_id: egen sum_clinic = sum(clinic) if clinic!=.
> 
> then it returns missing value for the sum variable only for the time point

> which is missing and not for all the time points for that patient. Can 
> anyone please suggest how to resolve this?

Martin already gave a good solution. Here is an alternative starting
from first principles:

. by patient_id (clinic), sort: gen sum_clinic = sum(clinic)

. by patient_id (clinic): ///
 replace sum_clinic = cond(!mi(clinic[_N]),sum_clinic[_N],.)

This uses the fact that observations which are missing on clinic==. are
sorted at the end of each observation. 

> Secondly, whats the best way to collapse the dataset to one observation
per 
> patient? Once I have the sum_clinic for each patient, it would be easier 
> just to have one observation per patient.

by patient_id: keep if _n==1


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index