Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Anna Reimondos <areimondos@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | st: String variable behaving oddly |
Date | Thu, 11 Oct 2012 21:33:37 +1100 |
Dear Statalist I am currently cleaning a survey dataset with a variety of numeric as well as string variables. I recently discovered some very odd behaviour with one of the string variables that I have to deal with before I can finish my work. In the example beow there are 23 responses from people who answered a question about who they believe is the most influential sports person in Australia. All these 23 people answered the same thing 'Evonne Goolagong Cawley' (some famous sports lady). The problem is that when I do a simple tab of the variable there are two entries for Evonne Goolagong Cawley instead of just one. I don't understand what is happening. . tab var1 [F4a] Most influential sportspeople: | 1st choice | Freq. Percent Cum. ----------------------------------------+----------------------------------- Evonne Goolagong Cawley | 2 8.70 8.70 Evonne Goolagong Cawley | 21 91.30 100.00 ----------------------------------------+----------------------------------- Total | 23 100.00 Twp respondents are somehow being identified as having a different answer to the rest of the people even though the spelling is exactly the same. I have tried trimming the data, triple checking the spelling and so on, but can't get to the bottom of this and it is driving me up the wall. Just for reference this 'issue' is affecting other entries as well, where what I think looks like exactly the same response is not recognised as such. Any help would be much appreciated. I have a copy of the dataset (just an extract) if anyone is interested. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/