Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: problems with encode


From   "Lukas Bösch" <L.Boesch@gmx.de>
To   statalist@hsphsun2.harvard.edu
Subject   st: problems with encode
Date   Wed, 01 Jun 2011 11:33:35 +0200

Dear statalist users

I am sorry, but the first mail i sended wasnt finished. I accidentely sent it...
I am having problems working with data that were stored as string variables and that i converted to numeric variables with encode. In this case the % of a countrys surface is sored as a string variable, (y1990-y2009) and i am encoding it into numeric variable (v1990-2009)

here is one example:

encode y1990, gen (v1990) I did this for 1990-2009 and 130 countries but only show the first two countries and the 4 first years.

country	y1990	y1991	y1992	y1993	v1990	v1991	v1992	v1993	
Afghan	0,43	0,43	0,43	0,43	0,43	0,43	0,43	0,43	
Algeria	6,31	6,31	6,31	6,31	6,31	6,31	6,31	6,31	

this seems to work fine, but when i am reshaping the data into a long form, it doesn't work any more.

drop y1990-y2009:
reshape long v, i(id) j(year);

year	country	         v
1990	Afghanistan	0,83
1991	Afghanistan	0,83
1992	Afghanistan	0,83
1993	Afghanistan	0,83
1990	Algeria	        5,62
1991	Algeria	        5,88
1992	Algeria	        6,17
1993	Algeria	        5,92

reshaping the string variable works fine though.


Another problem i am having with encoded data is the following:

The human development index is measured all 5 years (1990, 1995, 2000, 2005, 2009). I have stored it as a string variable and want to have it for the whole time period (1990-2009). In order to do this i just copy the values for the next years. For example, the hdi of 1990 is copied into 1991, 1992, 1993 and 1994. Here again, i start with encoding the data.

encode v1990, gen(value1990). I am doing this for all 5 years.

country	v1990	v1995	v2000 value1990 value1995 value2000
Norway	0,838	0,869	0,906  0,838     0,869      0,906
Austra	0,819	0,887	0,914  0,819     0,887      0,819

the next step is to generate the missing years and to copy the value of the existing years.

gen value1991 = value1990

value1991
106
103

In this case, stata creates a ranking and doesn't copy the value.


I am wondering how to deal with these problems. If i do these operations, the reshaping and copying, with the string variables, everything works fine, but i cant calculate with string variables and my aim is to do a regression model. 
I have read in the help file that, if the string variable contains numeric values simply stored as strings, which is my case, i should use the destring or generate real() functions.
I tried those, but they dont work:

destring v1990, replace

stata says: v1990 contains nonnumeric characters; no replace

and the same with destring, gen()

in the case of generate value1990 = real(v1990)

stata just generates missings.

I hope someone can help me and sorry for sending an unfinished email.

Kind regards

Lukas



-- 
NEU: FreePhone - kostenlos mobil telefonieren!			
Jetzt informieren: http://www.gmx.net/de/go/freephone


-- 
NEU: FreePhone - kostenlos mobil telefonieren!			
Jetzt informieren: http://www.gmx.net/de/go/freephone
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index