Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: insheet and dropping cases


From   Sergiy Radyakin <serjradyakin@gmail.com>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: insheet and dropping cases
Date   Thu, 20 Feb 2014 13:27:36 -0500

Ben,

-- the problem is likely caused by presence of unprintable characters
in the file, that are tolerated by StatTransfer, but not by Stata;

-- character with ASCII code 255 is a usual suspect;

-- pasting raw data to statalist is likely not to reveal the problem,
since the special characters might not survive massaging throw emails;

-- isolating the problem in the text editor into a new file could help
(keep the last record read in correctly and one immediately after),
then make the file available through a link, to retain its binary
structure, not all text editors will retain special chars on save;

-- use hexdump "file" , analyze tabulate to see unprintable
characters, then search for them in the file or use filefilter;

-- see "zap gremlins" for relevant tactic.

On the bright side: you are lucky you have 363 cases. Last time I had
this problem, only 16gb out of 40gb were read in. Try to open that
file in the notepad :)

Hope this helps.

Best, Sergiy Radyakin


On Thu, Feb 20, 2014 at 12:34 PM, Radwin, David <dradwin@rti.org> wrote:
> One other possibility is to use -inputst-, a Stata program that calls Stat/Transfer (part of -stcmd- by Roger Newson and available at SSC).
>
> This workaround is probably less computationally efficient than the suggestions from others, but since you already know that Stat/Transfer works, this approach might be faster and easier than trying to figure out the problem with your text files and -insheet- or -import delimited-.
>
> David
> --
> David Radwin, Senior Research Associate
> Education and Workforce Development
> RTI International
> 2150 Shattuck Ave. Suite 800, Berkeley, CA 94704
> Phone: 510-665-8274
>
> www.rti.org/education
>
>
>> -----Original Message-----
>> From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-
>> statalist@hsphsun2.harvard.edu] On Behalf Of Phil Schumm
>> Sent: Thursday, February 20, 2014 6:38 AM
>> To: Statalist Statalist
>> Subject: Re: st: insheet and dropping cases
>>
>> On Feb 20, 2014, at 8:28 AM, Ben Hoen <bhoen@lbl.gov> wrote:
>> > Hexdump I had never used.  This is what it returned:
>>
>> <snip>
>>
>> > Do you see anything suspicious here?  (I replaced all the commas with
>> "_", using filefilter - another great suggestion -  wondering if that was
>> causing any issues and insheet still returned 184 observations.)
>>
>>
>> I don't see anything obvious -- you'll need to look at the file directly.
>> Is Stata reading the first 184 observations, or are the 184 observations
>> from different places in the file?  Check that first, and if you are
>> getting the first 184 observations, then look at lines 184-6 (depending on
>> whether the file has a header line).  Something has to be going on there.
>>
>>
>> -- Phil
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index