Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: String variable behaving badly

From   Anna Reimondos <>
Subject   st: String variable behaving badly
Date   Thu, 11 Oct 2012 21:29:48 +1100

Dear Statalist

I am currently cleaning a survey dataset with a variety of numeric as
well as string variables. I recently discovered some very odd
behaviour with one of the string variables.

An extract of the data containing two variables (an ID variable and
the problematic string variable) is available here:

In the dataset are the 23 responses from people who answered a
question about who they believe is the most influential sports person
in Australia. All these 23 people answered the same thing 'Evonne
Goolagong Cawley' (a famous sports lady).

The problem is that when I do a simple tab of the variable there are
two entries for Evonne Goolagong Cawley instead of just one. I don't
understand what is happening. In the dataset you can see that the
first 2 respondents are somehow being identified as having a different
answer to the rest of the people even though the spelling is exactly
the same. I have tried trimming the data, triple checking the spelling
 and so on, but can't get to the bottom of this and it is driving me
up the wall.

Just for reference this 'issue' is affecting other entries as well,
where what I think looks like exactly the same response is not
recognised as such.
 Any help would be much appreciated.

I am using Stata 12.1


*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index