Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <n.j.cox@durham.ac.uk> |

To |
"'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: problems with encode |

Date |
Wed, 1 Jun 2011 11:08:46 +0100 |

As you have found out, -encode- will take values like "0,43" and map them to integers with value labels. They will look the same as before, but in principle the approach is quite wrong for such data. As you say, you need to -destring- such variables. What you need with your data is to spell out the -dpcomma- option. Note that with -destring- you can operate on several variables at once. . destring y1990-y2009, replace dpcomma All this is documented in the help for -destring-. Nick n.j.cox@durham.ac.uk Lukas Bösch I am having problems working with data that were stored as string variables and that i converted to numeric variables with encode. In this case the % of a countrys surface is sored as a string variable, (y1990-y2009) and i am encoding it into numeric variable (v1990-2009) here is one example: encode y1990, gen (v1990) I did this for 1990-2009 and 130 countries but only show the first two countries and the 4 first years. country y1990 y1991 y1992 y1993 v1990 v1991 v1992 v1993 Afghan 0,43 0,43 0,43 0,43 0,43 0,43 0,43 0,43 Algeria 6,31 6,31 6,31 6,31 6,31 6,31 6,31 6,31 this seems to work fine, but when i am reshaping the data into a long form, it doesn't work any more. drop y1990-y2009: reshape long v, i(id) j(year); year country v 1990 Afghanistan 0,83 1991 Afghanistan 0,83 1992 Afghanistan 0,83 1993 Afghanistan 0,83 1990 Algeria 5,62 1991 Algeria 5,88 1992 Algeria 6,17 1993 Algeria 5,92 reshaping the string variable works fine though. Another problem i am having with encoded data is the following: The human development index is measured all 5 years (1990, 1995, 2000, 2005, 2009). I have stored it as a string variable and want to have it for the whole time period (1990-2009). In order to do this i just copy the values for the next years. For example, the hdi of 1990 is copied into 1991, 1992, 1993 and 1994. Here again, i start with encoding the data. encode v1990, gen(value1990). I am doing this for all 5 years. country v1990 v1995 v2000 value1990 value1995 value2000 Norway 0,838 0,869 0,906 0,838 0,869 0,906 Austra 0,819 0,887 0,914 0,819 0,887 0,819 the next step is to generate the missing years and to copy the value of the existing years. gen value1991 = value1990 value1991 106 103 In this case, stata creates a ranking and doesn't copy the value. I am wondering how to deal with these problems. If i do these operations, the reshaping and copying, with the string variables, everything works fine, but i cant calculate with string variables and my aim is to do a regression model. I have read in the help file that, if the string variable contains numeric values simply stored as strings, which is my case, i should use the destring or generate real() functions. I tried those, but they dont work: destring v1990, replace stata says: v1990 contains nonnumeric characters; no replace and the same with destring, gen() in the case of generate value1990 = real(v1990) stata just generates missings. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: RE: problems with encode***From:*"Lukas Bösch" <L.Boesch@gmx.de>

**References**:**st: problems with encode***From:*"Lukas Bösch" <L.Boesch@gmx.de>

- Prev by Date:
**st: problems with encode** - Next by Date:
**st: Two stage least square estimation overidentified Sargan test** - Previous by thread:
**st: problems with encode** - Next by thread:
**Re: st: RE: problems with encode** - Index(es):