Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Re: Problem with variable names using Insheet


From   Sergiy Radyakin <serjradyakin@gmail.com>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: Re: Problem with variable names using Insheet
Date   Tue, 14 May 2013 01:46:29 -0400

Joseph Deckert reported difficulties importing data to Stata from a
CSV file. Reported symptoms included presence of abnormal characters
+ACI- around the imported content (variable names). A number of
recommendations were given on how to batch-rename the variables in the
imported file.

The reason of the problem however was not established. Joseph will
find in the following table:
http://www.string-functions.com/encodingtable.aspx?encoding=65000&decoding=20127
that the characters he observes correspond to the unicode
double-quotes symbol (") under some encoding.
Other commonly encountered sequences are:
+AD4- >
+Adw <
+AC0- -
+ACo- *
+AD0- =
+ADs ;
APA- <
+AF4- ^

Furthermore, some versions of OpenOffice were reported to have a
problem related to embedding these characters:
  http://forum.openoffice.org/en/forum/viewtopic.php?f=9&t=20619
  http://superuser.com/questions/219373/open-office-ruined-my-csv-file-how-to-save-csv-on-linux-with-open-office
however it is not immediately clear whether the problem is in the
software or the user settings and data transformations.

Renaming the variables may not eliminate the problem, since the
content of the cells is also subject to the same encoding problem.

It was not established until now whether Microsoft's Excel can produce
such a problematic file. Excel 2010 does not give me the freedom to
specify the code page of the CSV file (at least with the default file
filters).

I would be curious to know which particular version of software
produced such corrupt file that Stata couldn't import and which
options (if any) were specified. My rather weak suspicion is that the
file was downloaded with a browser, while incorrect character encoding
was selected. See related explanations here:

http://wordpress.stackexchange.com/questions/77108/if-a-hacker-changed-the-blog-charset-to-utf-7-does-that-make-wordpress-vulnerabl

If Joseph has access to the original file that didn't have a problem
and from which the CSV file was obtained, he may want to import it to
Stata using some other format (eg XLS in Stata 12.0 or newer).

Best, Sergiy Radyakin

On Mon, May 13, 2013 at 9:34 PM, Joseph Coveney <stajc2@gmail.com> wrote:
> Nick Cox wrote:
>
>  . . . For the record, -renpfix- is, and always was, an official command.
>
> --------------------------------------------------------------------------------
>
> Thanks for setting me straight on that, Nick.  I stand corrected.
>
> It's curious that StataCorp provided only one of a pair of bookends, though.
>
> Joseph Coveney
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index