[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Howard Lempel <HLempel@brookings.edu> |

To |
"statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: Categorizing HIV status using a series of string variables |

Date |
Mon, 24 Nov 2008 20:27:56 -0500 |

Hi Chelsea, As Tom mentions, I think regular expressions are the best way to get what you want. I thought you might still want to know why you were getting your error. I think you're getting the error because Stata only recognizes a period as missing if a variable is a numeric variable. Your HIV variables are string variables, so Stata has literally stored a period in the variable. To refer to a character in a string variable, you need to enclose the character in quotes. Therefore, you need to replace statements like if hiv1==. with if hiv1=="." Another option would be: replace hiv1 = "" if hiv1=="." Or to use -encode- to make your string variables into numeric variables. Hope this helps. Howie Howie Lempel Research Assistant The Brookings Institution | Economic Studies 1775 Massachusetts Ave NW | Washington DC 20036 hlempel@brookings.edu | p: (202) 238-3576 -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Polis, Chelsea B. Sent: Monday, November 24, 2008 7:46 PM To: statalist@hsphsun2.harvard.edu Subject: st: Categorizing HIV status using a series of string variables Dear Statalisters, I am trying to figure out a way to code individuals as either having incident HIV seroconversion (had at least one negative HIV test, followed by one positive HIV test while under surveillance), prevalent HIV (had one or more positive HIV tests while under surveillance), or HIV-negative (had all HIV-negative tests while under surveillance). My dataset is set up as such, where N =negative, P=positive, .=not tested at that round, and I "indeterminate". I want to ignore any indeterminate tests, so I haven't included them here in the examples since I assume I will simply need to replace all "I"s with "."s, but help on figuring out a more elegant way to tweak the code to incorporate this fact would also be most appreciated! Study_id HIV1 HIV2 HIV3 HIV4 HIV5 HIV6 1 . N . . N P 2 . . N N N . 3 P P . . P . 4 N P . P P P 5 . . . P P P I also have a variable that shows these patterns in one variable, i.e. Study_id HIV 1 .N..NP (I would want this to be coded as incident seroconverter) 2 ..NNN. (I would want this to be coded as consistently seronegative) 3 PP..P. (I would want this to be coded as prevalent positive) 4 NP.PPP (I would want this to be coded as incident seroconverter) 5 ...PPP (I would want this to be coded as prevalent positive) These are string variables. Is there a simple formula to use to categorize these women as incident seroconverters, prevalent positives, or consistently seronegative? I tried something along the lines of: gen prevpos=0 replace prevpos=1 if hiv1==.|hiv1=="P" & hiv2==.|hiv2=="P" & hiv3==.|hiv3=="P" & hiv4==.|hiv4=="P" & hiv5==.|hiv5=="P" & hiv6==.|hiv6=="P" But I am receiving type mismatch r(109); Your suggestions would be most appreciated! * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Categorizing HIV status using a series of string variables***From:*"Polis, Chelsea B." <cpolis@jhsph.edu>

- Prev by Date:
**Re: st: Categorizing HIV status using a series of string variables** - Next by Date:
**st: re: plotmatrix error** - Previous by thread:
**Re: st: Categorizing HIV status using a series of string variables** - Next by thread:
**Re: st: Categorizing HIV status using a series of string variables** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |