Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: RE: problems with encode
Nick Cox <email@example.com>
st: RE: problems with encode
Wed, 1 Jun 2011 11:08:46 +0100
As you have found out, -encode- will take values like "0,43" and map them to integers with value labels. They will look the same as before, but in principle the approach is quite wrong for such data.
As you say, you need to -destring- such variables. What you need with your data is to spell out the -dpcomma- option.
Note that with -destring- you can operate on several variables at once.
. destring y1990-y2009, replace dpcomma
All this is documented in the help for -destring-.
I am having problems working with data that were stored as string variables and that i converted to numeric variables with encode. In this case the % of a countrys surface is sored as a string variable, (y1990-y2009) and i am encoding it into numeric variable (v1990-2009)
here is one example:
encode y1990, gen (v1990) I did this for 1990-2009 and 130 countries but only show the first two countries and the 4 first years.
country y1990 y1991 y1992 y1993 v1990 v1991 v1992 v1993
Afghan 0,43 0,43 0,43 0,43 0,43 0,43 0,43 0,43
Algeria 6,31 6,31 6,31 6,31 6,31 6,31 6,31 6,31
this seems to work fine, but when i am reshaping the data into a long form, it doesn't work any more.
reshape long v, i(id) j(year);
year country v
1990 Afghanistan 0,83
1991 Afghanistan 0,83
1992 Afghanistan 0,83
1993 Afghanistan 0,83
1990 Algeria 5,62
1991 Algeria 5,88
1992 Algeria 6,17
1993 Algeria 5,92
reshaping the string variable works fine though.
Another problem i am having with encoded data is the following:
The human development index is measured all 5 years (1990, 1995, 2000, 2005, 2009). I have stored it as a string variable and want to have it for the whole time period (1990-2009). In order to do this i just copy the values for the next years. For example, the hdi of 1990 is copied into 1991, 1992, 1993 and 1994. Here again, i start with encoding the data.
encode v1990, gen(value1990). I am doing this for all 5 years.
country v1990 v1995 v2000 value1990 value1995 value2000
Norway 0,838 0,869 0,906 0,838 0,869 0,906
Austra 0,819 0,887 0,914 0,819 0,887 0,819
the next step is to generate the missing years and to copy the value of the existing years.
gen value1991 = value1990
In this case, stata creates a ranking and doesn't copy the value.
I am wondering how to deal with these problems. If i do these operations, the reshaping and copying, with the string variables, everything works fine, but i cant calculate with string variables and my aim is to do a regression model.
I have read in the help file that, if the string variable contains numeric values simply stored as strings, which is my case, i should use the destring or generate real() functions.
I tried those, but they dont work:
destring v1990, replace
stata says: v1990 contains nonnumeric characters; no replace
and the same with destring, gen()
in the case of generate value1990 = real(v1990)
stata just generates missings.
* For searches and help try: