Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | James Sams <sams.james@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: importing quirky csv |
Date | Fri, 25 Nov 2011 11:03:31 -0600 |
On Thursday 24, November 2011 08:51:16 you wrote: > I have a large number of large comma-separated text files that I am > trying to import. "insheet" is not working; it imports the data, but > many lines are missing. I think the reason is the file contains string > fields that a) have embedded spaces, and b) are not enclosed in > quotes. I've run across this and found that, unless you want to write your own csv parser (which is trickier than you might think), you will have to work outside of Stata. That said, it is easy to automate from a do file. I've found Python's csv parser to be quite robust and able to write out the csv files in such a way that Stata will happily read them. The approach I took was to just parse the entire directory of csv's and then import those into Stata. However, let's say you wanted to make a script that you call for each file from within Stata, then the python code should look like this (assuming python 2.7 and actually commas as the separators. Note that whitespace is very important in python): #!/usr/bin/env python # reprocess_csv.py # make files readable for stata import sys import csv DELIMITER = "," def reprocess(in_fn, out_fn): with open(in_fn, 'rb') as in_fd: with open(out_fn, 'wb') as out_fd: reader = csv.reader(in_fd, delimiter=DELIMITER) writer = csv.writer(out_fd, delimiter=DELIMITER) writer.writerows(reader) if __name__ == "__main__": reprocess(sys.argv[1], sys.argv[2]) and then in stata: local my_original_file "bad.csv" tempfile good ! python reprocess_csv.py `my_original_file' `good' insheet using `good', comma I did write this on the fly, so there may be typos that I didn't catch, but it is based on code I've used previously that works reliably. -- James Sams sams.james@gmail.com * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/