Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Anna Reimondos <areimondos@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | st: String variable behaving badly |
Date | Thu, 11 Oct 2012 21:29:48 +1100 |
Dear Statalist I am currently cleaning a survey dataset with a variety of numeric as well as string variables. I recently discovered some very odd behaviour with one of the string variables. An extract of the data containing two variables (an ID variable and the problematic string variable) is available here: http://wikisend.com/download/508418/stringdata.dta In the dataset are the 23 responses from people who answered a question about who they believe is the most influential sports person in Australia. All these 23 people answered the same thing 'Evonne Goolagong Cawley' (a famous sports lady). The problem is that when I do a simple tab of the variable there are two entries for Evonne Goolagong Cawley instead of just one. I don't understand what is happening. In the dataset you can see that the first 2 respondents are somehow being identified as having a different answer to the rest of the people even though the spelling is exactly the same. I have tried trimming the data, triple checking the spelling and so on, but can't get to the bottom of this and it is driving me up the wall. Just for reference this 'issue' is affecting other entries as well, where what I think looks like exactly the same response is not recognised as such. Any help would be much appreciated. I am using Stata 12.1 Thanks! Anna * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/