Dear Statalist,
I am trying to clean some data, in which I have 2 different and contradictory lab results on the same date. Example:
id cd4c cd4cdate
1 325 01 Mar 06
1 352 01 Mar 06
1 500 03 Aug 06
2 167 20 Mar 06
2 302 20 Mar 06
2 900 12 Dec 06
3 118 20 Oct 05
3 178 20 Oct 05
3 450 01 May 06
I want to drop the row with the highest cd4c date when there's a date match. This is proving to be surprisingly hard to do. I tried
sort id cd4c cd4cdate
bysort imbd_id cd4cdate: gen min= cd4c[1]
and then tried to replace the cd4c with min, but in this case, min was not always the smallest cd4. I suspect that's because when I sort based on the date, it doesn't also sort by the cd4c value. Is there a way that I can reliably drop the row with the largest CD4c value when id and cd4cdate match?
Many thanks!
--Ann
Ann C. Miller, PhD, MPH
Research Associate
FXB Center for Health and Human Rights
Harvard School of Public Health
651 Huntington Ave, 7th Floor
Boston, MA 02115
(617) 432-7297
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/