[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: RE: Creating a new numerical variable without mathematical operations
Ramani Gunatilaka asked
> I have two numerical variables, -province- and -city-:
> -province- has two digits, -city- has at least 4.
> e.g. -province- = 11, -city- = 6227
> I want to create a new numerical variable called -newcity- which will
> combine -province- and -city- like this:
> This is because I have to combine this data set with another, and the
> unique identifying variable in the second data set has been
> constructed in this way.
Jeph Herrin, Frederick J. Boehmke and Steven Stillman all suggested
some variant on this (Jeph's answer, basically):
> If you know the maximum digits for the -city- is k,
> gen newcity = province*(10^k) + city
> will do it. So if -city- has at most four digits,
> gen newcity = province*10000 + city.
However, Ramani corrected the original post:
> Thanks for your suggestions. My problem is that the city variable does
> not always have four digits. Sometimes it is two, at other times, it
> is three.
Kit Baum suggested this -- which does not depend on the numbers of
> If you're not using maths, you must be using strings!
> | province city |
> 1. | 11 321 |
> 2. | 22 4321 |
> 3. | 33 21 |
> . gen newcity = real(string(province) + string(city))
> . l
> | province city newcity |
> 1. | 11 321 11321 |
> 2. | 22 4321 224321 |
> 3. | 33 21 3321 |
and Liu Nizi suggested something similar:
> First, you can translate these two variables into string variables by
> command -tostring-.
> Second, you can combine these two string variables just by
> . gen newcity = province_str + city_str
> Third, translate -newcity- into a numerical variable by command -destring-.
Setting aside the impending -merge- or -append-, some extra comments
are yet possible:
1. Doing it as a problem in arithmetic (multiplying one numeric id by an
appropriate power of 10 and then adding the other) is a good simple
technique -- so long as the identifiers are well behaved.
2. Doing it as a problem in concatenation of strings is easier done
. egen newcity = concat(province city)
This produces a string variable, which for most purposes as an
identifier is fine left as is without forcing its contents back to numeric.
In general, it is often safer and rarely problematic to insert
an extra space, to remove possible ambiguities with some
identifiers in reversing the coding:
. egen newcity = concat(province city), p(" ")
and there could be no objection to anyone preferring to do that
. gen newcity = string(province) + " " + string(city)
3. In other circumstances, a composite identifier created like this
. egen newcity = group(province city), label
has some advantages, provided that the number of value labels created
is not problematic.
* For searches and help try: