Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: AW: st: simple sum() question


From   Shehzad Ali <sia500@york.ac.uk>
To   statalist@hsphsun2.harvard.edu
Subject   Re: AW: st: simple sum() question
Date   17 Apr 2009 15:52:59 +0100

Thank you so much, Martin and Ulrich! Your suggestions did the trick.

Thank you again,

Shehzad

On Apr 17 2009, Martin Weiss wrote:

<>
Shezad might also like to note that -egen, sum()- was abandoned for the
Stata 9 release (-help whatsnew8to9-), but a look at its code shows that it
calls -egen, total()- on the user`s behalf.

*************
clear*

input patient_id    timepoint    clinic
1                 1           .
1                 2           .
1                 3           .
2                 1           2
2                 2           0
3                 1           1
3                 2           .
end

bysort patient_id: egen sum_clinic = sum(clinic)
bysort patient_id: egen total_clinic = total(clinic)

compare sum_clinic total_clinic
*************

The -sum()- function lives on when called by -gen- as in my code earlier. It produces a runnning sum, though, in contrast to the (constant) total returned by -egen, total()-...


HTH
Martin


-----Ursprüngliche Nachricht-----
Von: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von Ulrich Kohler
Gesendet: Freitag, 17. April 2009 15:43
An: statalist@hsphsun2.harvard.edu
Betreff: Re: st: simple sum() question

Shehzad Ali wrote
Here is a quick summary of what I am doing. Each patient (varname: patient_id) was observed at 4 time points and at each time point we asked about the clinic visits (varname: clinic) in the last 3 months. The
dataset
is in long form (shown below):

patient_id timepoint clinic 1 1 0
1                 2           1
1                 3           .
2                 1           2
2                 2           0
3                 1           1
3                 2           .

The line below generates a sum of all clinic visits for each patient:

bysort patient_id: egen sum_clinic = sum(clinic)

Now if at one time point, clinic visit is missing (as its seen for
patients
1 and 3), then I want stata to return missing value for the sum. The above

command returns the total of the non-missing observations, ignoring the missing ones (understandably). But if I tried:

bysort patient_id: egen sum_clinic = sum(clinic) if clinic!=.

then it returns missing value for the sum variable only for the time point

which is missing and not for all the time points for that patient. Can anyone please suggest how to resolve this?

Martin already gave a good solution. Here is an alternative starting
from first principles:

. by patient_id (clinic), sort: gen sum_clinic = sum(clinic)

. by patient_id (clinic): ///
replace sum_clinic = cond(!mi(clinic[_N]),sum_clinic[_N],.)

This uses the fact that observations which are missing on clinic==. are
sorted at the end of each observation.
Secondly, whats the best way to collapse the dataset to one observation
per
patient? Once I have the sum_clinic for each patient, it would be easier just to have one observation per patient.

by patient_id: keep if _n==1


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index