Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

AW: st: simple sum() question


From   "Martin Weiss" <martin.weiss1@gmx.de>
To   <statalist@hsphsun2.harvard.edu>
Subject   AW: st: simple sum() question
Date   Fri, 17 Apr 2009 14:43:59 +0200

<> 


*************
clear*

input patient_id    timepoint    clinic 
1                 1           0
1                 2           1
1                 3           .
2                 1           2
2                 2           0
3                 1           1
3                 2           .

end

compress

bysort patient_id: gen sum_clinic = sum(clinic)

tempvar sum_clinic_miss
bys patient: egen `sum_clinic_miss'= total(mi(clinic))
replace sum_clinic=. if `sum_clinic_miss'>0 & !mi(`sum_clinic_miss')

list, noobs  
*************



HTH
Martin

-----Ursprüngliche Nachricht-----
Von: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von Shehzad Ali
Gesendet: Freitag, 17. April 2009 14:34
An: statalist@hsphsun2.harvard.edu
Betreff: Re: st: simple sum() question

Following on after yesterday's discussion, I have a quick follow on 
question.

Here is a quick summary of what I am doing. Each patient (varname: 
patient_id) was observed at 4 time points and at each time point we asked 
about the clinic visits (varname: clinic) in the last 3 months. The dataset 
is in long form (shown below):

patient_id    timepoint    clinic 
1                 1           0
1                 2           1
1                 3           .
2                 1           2
2                 2           0
3                 1           1
3                 2           .

The line below generates a sum of all clinic visits for each patient:

bysort patient_id: egen sum_clinic = sum(clinic)

Now if at one time point, clinic visit is missing (as its seen for patients 
1 and 3), then I want stata to return missing value for the sum. The above 
command returns the total of the non-missing observations, ignoring the 
missing ones (understandably). But if I tried:

bysort patient_id: egen sum_clinic = sum(clinic) if clinic!=.

then it returns missing value for the sum variable only for the time point 
which is missing and not for all the time points for that patient. Can 
anyone please suggest how to resolve this?

Secondly, whats the best way to collapse the dataset to one observation per 
patient? Once I have the sum_clinic for each patient, it would be easier 
just to have one observation per patient.

Thank you,
Shehzad


On Apr 15 2009, Martin Weiss wrote:

><>
>
>Those two differ only in case you have missings...
>
>HTH
>Martin
>_______________________
>----- Original Message ----- 
>From: "Shehzad Ali" <sia500@york.ac.uk>
>To: <statalist@hsphsun2.harvard.edu>
>Sent: Wednesday, April 15, 2009 7:06 PM
>Subject: RE: st: AW: simple sum() question
>
>
>> Thanks, Nick. But I am not trying to count the total number of 
>> observations per patient but the total number of visits (varname: 
>> clinic) across all time points for each patient (I tried to clearly 
>> state it in the first post - sorry if I wasn't clear).
>>
>> The solution I am now using is:
>>
>> bysort patient_id: egen sum_clinic = sum(clinic)
>>
>> Thank you,
>>
>> Shehzad
>>
>> On Apr 15 2009, Nick Cox wrote:
>>
>>>Unless there are further complications as yet unrevealed,
>>>bysort id : gen visits = _N
>>>is a direct and simple solution.
>>>If you just wanted to count a subset, then
>>>gen interesting = <binary variable defining interesting> bysort 
>>>interesting id : gen interesting_visits = _N if interesting
>>>There are -egen- routes as well, but for problems like this going back
>>>to basics is difficult to beat.
>>>See also
>>>SJ-2-1  pr0004  . . . . . . . . . . Speaking Stata:  How to move step
>>>by: step
>>>        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N.
>>>J. Cox
>>>        Q1/02   SJ 2(1):86--102                                  (no
>>>commands)
>>>        explains the use of the by varlist : construct to tackle
>>>        a variety of problems with group structure, ranging from
>>>        simple calculations for each of several groups to more
>>>        advanced manipulations that use the built-in _n and _N
>>>
>>>if a tutorial is needed. That's free on-line at the Stata Journal
>>>website.
>>>Note that even if you did want a collapsed dataset, -contract- rather
>>>than -collapse- is more direct.
>>>Nick n.j.cox@durham.ac.uk
>>>Shehzad Ali
>>>
>>>Hi Martin and Josiane,
>>>
>>>Thank you for your replies. You are right that I am interested in the
>>>total count of visits for each patient and not the running sum.
>>>
>>>Sorry, I should have mentioned that patients who had three visits, for 
>>>instance, have three observations, and those with two visits have two 
>>>observations. Therefore, the total number of observations for 100
>>>patients is less than 400 (I had made up hypothetical numbers in haste to
>>>simplify the case. Not always a good idea).
>>>
>>>With Martin's solution, I will need to have four observations for each 
>>>patient (sorry this was my fault as I didn't provide the correct 
>>>information). With Josiane's suggestion, the dataset collapses which is
>>>not what I want.
>>>
>>>Can you suggest a modified solution please? Again, sorry for the unclear
>>>
>>>email earlier.
>>>
>>>On Apr 15 2009, Martin Weiss wrote:
>>>
>>>> I am betting that you want a count of visits, not a running sum, but 
>>>> correct me if I am wrong...
>>>
>>>>clear*
>>>>set obs 400
>>>>egen float patient = seq(), from(1) to(400) block(4)
>>>>egen float visit = seq(), from(1) to(4) block(1)
>>>>
>>>>//not strictly necessary
>>>>xtset patient visit
>>>>
>>>>//less than 4 visits for some
>>>>replace visit =. if runiform()<0.05
>>>>
>>>>bys patient: egen overallvisits=count(visit)
>>>>
>>>>l in 1/20, sepby(patient) noo
>>>>*************
>>>
>>>Shehzad Ali
>>>
>>>>I have a simple question about summing across observations. I have 100 
>>>>patients (variable: patient_id) in the dataset, each had clinic visits 
>>>>(variable: clinic) and hospital visits (variable: hospital) recorded at
>>>
>>>>weeks 4, 8, 12 and 16. The dataset is long and hence I have 400 
>>>>observations (one observation per patient per time point).
>>>>
>>>> I want to sum the clinic visits for each patient (across all 4 visits)
>>>
>>>> bearing in mind that some patients had less than 4 visits. So
>>>effectively
>>>> I want to generate a new variable that will produce the sum of clinic 
>>>> visits for each patient.
>>>
>>>*
>>>*   For searches and help try:
>>>*   http://www.stata.com/help.cgi?search
>>>*   http://www.stata.com/support/statalist/faq
>>>*   http://www.ats.ucla.edu/stat/stata/
>>>
>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>> 
>
>*
>*   For searches and help try:
>*   http://www.stata.com/help.cgi?search
>*   http://www.stata.com/support/statalist/faq
>*   http://www.ats.ucla.edu/stat/stata/
>


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index