Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Re: how to split numeric variable


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: Re: how to split numeric variable
Date   Thu, 25 Sep 2003 10:56:02 +0100

Wade T Roberts posed a problem to which various
people offered solutions, and indeed, there's more
than one way to do it. I add some extra comments
on each solution.

> I have a single numeric variable that identifies the
> city/county/state for
> each case, where the first three digits represent the
> city, the next two
> the county, and the final two represent the state.
>
> Examples:
>
>   223406
>  1453209
>  2785845
>  etc...
>
> I'm only interested in identifying cases by state at the
> moment.  How do I
> go about singling out this part of the data, or creating
> a new identifying state variable?

Marcela Perticara

> I donīt know if there is an specific command for this, but
> one way would be
> to
>
> gen statecode=full_id-int(full_id/100)*100
>
> where full_id is the original variable. I guess it should
> work as long as
> your state code is always in the last two digit of
> your original variable.

In a similar vein, the last two digits are

	mod(full_id,100)

Note that the last two digit 06 will map to 6.
If you wanted the explicit zero, you would need
something like

	string(mod(full_id,100), "%02.0f")

where the result is a string and the numeric
format %02.0f insures a leading zero whenever
the result would otherwise be a single digit.

Fernando Lozano

> gen newvar=string(oldvar)
> gen state=substr(newvar,n1,n2)

> where n1 is the first digit of the variable to appear on state and
n2 is
> the last digit. For example:

> if newvar(i)=1453209
> then gen state(i)=substr(newvar,5,7) will generate state(i)=09

The main idea is fine, but a few details are wrong here.
n2 is not the last digit, but the
(maximum) length of the substring. Subscripts are given within []
and cannot be supplied on the left of the = sign.

Daniel R. Sabath

> Since you really are not using the state variable as a numeric,
convert it
> to a string.

>	tostring geocode, generate(str_geocode)

> Then use the string processing functions to get what you want. In
this case
	gen state = substr(str_geocode,-2,2) /* -2 is 2 from the right side
> for 2 characters */
>	gen county = substr(str_geocode,-4,2)

> The only problem you have is where the city code is less than 100.

> This pads the string out to 7 characters if it only has 6.
> 	replace str_geocode = "0" + str_geocode if length(str_geocode) == 6
> Then
>	gen city = substr(str_geocode,1,3)

> More information can be had by typing "help substr" which will bring
up help
> on all the string functions.

An alternative is just to use -string()-, as Fernardo suggested.
Daniel's idea can then be re-expressed this way:

	gen state = substr(string(geocode),-2,2)
	gen county = substr(string(geocode),-4,2)
	gen city = substr(string(geocode, "%07.0f"),1,3)

Nick
n.j.cox@durham.ac.uk

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2022 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index