Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: data management question


From   Caroline Wilson <[email protected]>
To   "[email protected]" <[email protected]>
Subject   RE: st: data management question
Date   Thu, 19 Sep 2013 05:28:27 +0000

Sorry to ask another question about this. I'm now struggling to create a variable called "median" which, for a given pat_ID, would be calculated by taking the MEDIAN of every other value of Md_T in the same phy_ID EXCEPT for the current value of Md_T. Below I show a sample of my data and what "median" should look like.
So for example: for pat_ID = 2, the phy_id=118. So I take the median value of Md_T for the other 2 pat_IDs belonging to phy_id=118 (median of 3.48 & 4.12, which is 3.8). For pat_ID=3, the phy_id=118. So I take the median value of Md_T for the other 2 pat_IDs belonging to phy_id=118 (1.85&4.12), which is 2.99.
I tried using similar logic as in Daniel's code for the mean, however the formula for the median is more complex than the mean formula (e.g. it depends on whether the total number of values is odd or even). Does anyone have ideas about how to calculate this? For example, maybe there is a way to use the median function just on every other value of the same phy_id but the current?
Any help would be much appreciated. Many thanks!!!

pat_ID    phy_id    Md_T  median
1          102       3.23     .
2          118       1.85   3.80
3          118       3.48   2.99
4          118       4.12   2.67
5          132       1.39   3.00
6          132       1.61   3.00
7          132       1.69   3.00
8          132       1.74   1.74
9          132       3.00   1.74
10         132       3.03   1.74
11         132       4.28   1.74
12         132       6.90   1.74

> From: [email protected]
> To: [email protected]
> Subject: RE: st: data management question
> Date: Wed, 18 Sep 2013 22:54:32 +0000
> 
> Apologies for the confusion - the variable "value" should have been called "Md_T".
> Anyway, your solution worked perfectly - very many thanks!!!!
> 
> Caroline
> 
> ----------------------------------------
>> Date: Wed, 18 Sep 2013 18:33:19 -0400
>> From: [email protected]
>> To: [email protected]
>> Subject: Re: st: data management question
>>
>>
>>
>> On Wed, 18 Sep 2013, Caroline Wilson wrote:
>>
>>> Hello,
>>>
>>>
>>>
>>> I’m wondering if someone can help with a data management
>>> question.
>>>
>>>
>>>
>>> I’m trying to create a variable called “mean”, which, for a given
>>> pat_ID, would be calculated by taking the mean of every other value of
>>> “Md_T” in the same phy_ID EXCEPT for the current row.
>>>
>>
>> I am a little unclear on what you are asking - Md_T isn't in the sample
>> data you show, but you want the mean of it? So I won't use your variable
>> names. Nevertheless, I think that the -egen- -total- function and a
>> generate statement will get you what you want:
>>
>> by ID: egen sum=total(var)
>> generate sumex= sum-var
>> by ID: generate meanex = sumex/(_N-1)
>>
>> The total by ID gives the sum of var for each level of ID. Then we
>> subtract the current level of var and divide by the number of observations
>> in the ID group. (_N is the number of observations in the by group).
>>
>> Daniel Feenberg 
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/ 		 	   		  
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index