Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Creating a new numerical variable without mathematical operations


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: Creating a new numerical variable without mathematical operations
Date   Sun, 22 Oct 2006 19:01:59 +0100

Ramani Gunatilaka asked 

> I have two numerical variables, -province- and -city-: 
> -province- has two digits, -city- has at least 4.
> e.g. -province- = 11, -city- =  6227
>
> I want to create a new numerical variable called -newcity- which will
> combine -province- and -city- like this:
>
> 116227
>
> This is because I have to combine this data set with another, and the
> unique identifying variable in the second data set has been
> constructed in this way.

Jeph Herrin, Frederick J. Boehmke and Steven Stillman all suggested
some variant on this (Jeph's answer, basically):  

> If you know the maximum digits for the -city- is k,
>
>  gen newcity = province*(10^k) + city
>
> will do it. So if -city- has at most four digits,
>
>  gen newcity = province*10000 + city.

However, Ramani corrected the original post: 

> Thanks for your suggestions. My problem is that the city variable does
> not always have four digits. Sometimes it is two, at other times, it
> is three.

Kit Baum suggested this -- which does not depend on the numbers of 
digits. 

> If you're not using maths, you must be using strings!
>
>      +-----------------+
>      | province   city |
>      |-----------------|
>   1. |       11    321 |
>   2. |       22   4321 |
>   3. |       33     21 |
>      +-----------------+

> . gen newcity = real(string(province) + string(city))

> . l
>
>      +--------------------------+
>      | province   city  newcity |
>      |--------------------------|
>   1. |       11    321    11321 |
>   2. |       22   4321   224321 |
>   3. |       33     21     3321 |
>       +--------------------------+

and Liu Nizi suggested something similar: 

> First, you can translate these two variables into string variables by 
> command -tostring-.

> Second, you can combine these two string variables just by 

> . gen newcity = province_str + city_str

> Third, translate -newcity- into a numerical variable by command -destring-. 

Setting aside the impending -merge- or -append-, some extra comments 
are yet possible: 

1. Doing it as a problem in arithmetic (multiplying one numeric id by an 
appropriate power of 10 and then adding the other) is a good simple 
technique -- so long as the identifiers are well behaved. 

2. Doing it as a problem in concatenation of strings is easier done 
by 

. egen newcity = concat(province city) 

This produces a string variable, which for most purposes as an 
identifier is fine left as is without forcing its contents back to numeric. 

In general, it is often safer and rarely problematic to insert
an extra space, to remove possible ambiguities with some 
identifiers in reversing the coding: 

. egen newcity = concat(province city), p(" ") 

and there could be no objection to anyone preferring to do that 
directly: 

. gen newcity = string(province) + " " + string(city) 

3. In other circumstances, a composite identifier created like this 

. egen newcity = group(province city), label 

has some advantages, provided that the number of value labels created
is not problematic. 

Nick 
n.j.cox@durham.ac.uk 

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index