Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Importing data with infile: Identifying records with problems


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   RE: st: Importing data with infile: Identifying records with problems
Date   Thu, 3 Sep 2009 11:39:12 +0100

Sergiy's comments overlook several much more positive possibilities,
given the data management and programming powers of Stata. 

Here's one more: 

Once you know that certain numeric variables are problematic, consider
importing them as string and looking for problems within Stata. For
example, missing(real(strvar)) flags values that can't be read as
numbers. -destring- has extra options that might help too. 

Nick 
[email protected] 

John LeBlanc

I just did a workaround in case it's useful to others.

Sort the records before importing to Stata (assuming relevant id field 
is in left-most columns).
Import
Save list of error messages
incorporate original record number into database: e.g.,
***
gen recno = _n
move recno serial
***

export or print list of recno & their corresponding serial or ID 
numbers. At least one now has an easy way to pull original case report 
forms and fix faulty inputs.

Sergiy Radyakin wrote:

> I am afraid it is not possible. And we are quite lucky that in this
case Stata
> reports in which record there was a problem. When it comes to errors
in .ado
> programs, it only tells you the error code, but not the line number.
>
> At least basic error reporting in relation to the source code is
> highly desirable
> and would make debugging .ado files a much more pleasant task,
ultimately
> improving the quality of the code. See, how it was done in TurboPascal
in the
> 1980s:
>
http://ugweb.cs.ualberta.ca/~roman/tutorials/History/F2000WinNT4/Pascal/
editing.html


John LeBlanc
   
>> Is there any way that Infile can report a specific part of a record
when
>> there is an error as opposed to a record number? I am importing
dozens of
>> scanned files and getting hundreds of lines like the following:
>>
>> '*' cannot be read as a number for tfake[153]
>> '*' cannot be read as a number for corl[157]
>>
>> It would be much easier for me to deal with these if instead of the
record
>> number, e.g., 153, I could somehow instruct stata to report the first
six
>> characters of the record with the problem since those characters
represent
>> the research participant's serial number. Also, the record numbers
start at one for every text file that is imported and it therefore
becomes quite a
>> headache to manage both a relative record number and file identifier.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index