Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: problems with encode

From   Nick Cox <>
To   "''" <>
Subject   st: RE: problems with encode
Date   Wed, 1 Jun 2011 11:08:46 +0100

As you have found out, -encode- will take values like "0,43" and map them to integers with value labels. They will look the same as before, but in principle the approach is quite wrong for such data. 

As you say, you need to -destring- such variables. What you need with your data is to spell out the -dpcomma- option. 

Note that with -destring- you can operate on several variables at once. 

. destring y1990-y2009, replace dpcomma

All this is documented in the help for -destring-. 


Lukas Bösch

I am having problems working with data that were stored as string variables and that i converted to numeric variables with encode. In this case the % of a countrys surface is sored as a string variable, (y1990-y2009) and i am encoding it into numeric variable (v1990-2009)

here is one example:

encode y1990, gen (v1990) I did this for 1990-2009 and 130 countries but only show the first two countries and the 4 first years.

country	y1990	y1991	y1992	y1993	v1990	v1991	v1992	v1993	
Afghan	0,43	0,43	0,43	0,43	0,43	0,43	0,43	0,43	
Algeria	6,31	6,31	6,31	6,31	6,31	6,31	6,31	6,31	

this seems to work fine, but when i am reshaping the data into a long form, it doesn't work any more.

drop y1990-y2009:
reshape long v, i(id) j(year);

year	country	         v
1990	Afghanistan	0,83
1991	Afghanistan	0,83
1992	Afghanistan	0,83
1993	Afghanistan	0,83
1990	Algeria	        5,62
1991	Algeria	        5,88
1992	Algeria	        6,17
1993	Algeria	        5,92

reshaping the string variable works fine though.

Another problem i am having with encoded data is the following:

The human development index is measured all 5 years (1990, 1995, 2000, 2005, 2009). I have stored it as a string variable and want to have it for the whole time period (1990-2009). In order to do this i just copy the values for the next years. For example, the hdi of 1990 is copied into 1991, 1992, 1993 and 1994. Here again, i start with encoding the data.

encode v1990, gen(value1990). I am doing this for all 5 years.

country	v1990	v1995	v2000 value1990 value1995 value2000
Norway	0,838	0,869	0,906  0,838     0,869      0,906
Austra	0,819	0,887	0,914  0,819     0,887      0,819

the next step is to generate the missing years and to copy the value of the existing years.

gen value1991 = value1990


In this case, stata creates a ranking and doesn't copy the value.

I am wondering how to deal with these problems. If i do these operations, the reshaping and copying, with the string variables, everything works fine, but i cant calculate with string variables and my aim is to do a regression model. 
I have read in the help file that, if the string variable contains numeric values simply stored as strings, which is my case, i should use the destring or generate real() functions.
I tried those, but they dont work:

destring v1990, replace

stata says: v1990 contains nonnumeric characters; no replace

and the same with destring, gen()

in the case of generate value1990 = real(v1990)

stata just generates missings.

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index