Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

[no subject]



Without more information, better advice is difficult. But 
suppose that a state code was always two letters that 
appeared just before a numeric substring. 

Where the first numeric character appears can be put
in a variable like this: 

gen first_numeric = 0 
gen length = length(oldvar) 
su length, meanonly 
qui forval i = 1/`r(max)' { 
	replace first_numeric = `i' ///
	if first_numeric == 0 & inrange(real(substr(oldvar,`i',1)),0,9) 
} 

Then the state code can be extracted as 

gen state = substr(oldvar, first_numeric - 2, 2) 
	
Nick 
n.j.cox@durham.ac.uk 

Michael S. Hanson
 
> 	Actually, given the proposed solution it appears that 
> the worry is not 
> specifically "about 8", but whether the "State" code (a) 
> always begins 
> with the 4th character, and (b) always is just 2 characters long.  
> Depending on the universe of potential values of "oldvar", there may 
> exist applications of -assert- that can check these desired 
> conditions. 
>   However, notice that the currently proposed application of -substr- 
> should work as intended on values such as "CITNA34", "CITNA034", and 
> "CITNA0340", but will fail on "CINA134" and the like.
 
Nick Cox 

> > Nevertheless, your description "about 8" is worrying.
> > In any observations that have some other number
> > of characters, this solution may be incorrect. Try
> >
> > . assert length(oldvar) == 8

mfeurey@weber.ucsd.edu
> >
> >> That worked beautifully, thank you.  Just saved me five hours
> >> of banging
> >> my head against the wall.
> >
> >>> Try
> >>>
> >>> gen str State = substr(oldvar,4,2)
> >>>
> >>> This assumes that your original variable was string.
> >
> > mfeurey@weber.ucsd.edu
> >
> >>> I have a variable that has about 8 characters per 
> observation and I 
> >>> need to take 2 characters (4th and 5th characters) and 
> generate them 
> >>> as new variables representing state codes.  So for example, the 
> >>> variable has as one of its value CITNA134.  The 
> characters I need to 
> >>> extract are NA, which will be recoded as State code.  How 
> would I go 
> >>> about doing this?

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index