Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: infile and dictionaries and the small data mindset


From   "E. Paul Wileyto" <epw@mail.med.upenn.edu>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: infile and dictionaries and the small data mindset
Date   Wed, 14 May 2008 11:35:16 -0400

Thanks. Both are somewhat useful.
I think that what I actually need to do is import each line of text as a single string, and then parse it line-by-line according to whatever rules I can glean from the original data files. Is that possible?

If it is, I can parse by a set of rules that change according to which block headers I have hit up to that point. Sounds like a pain, but if I can code it once, then I can hand it to someone else to do the import.

P

Friedrich Huebler wrote:

Paul,

Perhaps you can do this with -insheet-, as described in this thread:

http://www.stata.com/statalist/archive/2008-02/msg00875.html
http://www.stata.com/statalist/archive/2008-02/msg00940.html

Friedrich

On Wed, May 14, 2008 at 9:23 AM, E. Paul Wileyto <epw@mail.med.upenn.edu> wrote:

One of our worst fears is that someone will come to us with data scattered
all over a spreadsheet file in little summary tables. If they have lots of
those files, I can usually find a way to script the import efficiently using
ODBC.

What if you have those same tables in a text file? Is there any efficient
way to import and parse data in such a format? I have the far end of this
process scripted so the researcher can generate his own summary statistics,
but getting the data into Stata involves a program making an excel file,
followed by cutting and pasting into Stata. I'd like to cut out some of the
import steps, so that all we would need to do is give a list of filenames to
a Stata script, and watch the screen roll by as the data get extracted.
The files are generated by a program that is monitoring mouse behavior.
Each file may contain behavior from one mouse on one day, or several mice
on one day. The general format is always the same. For each mouse-run,
there is a small block of ancillary information as a header. I cannot
guarantee that all of these blocks have the same number of words, but some
of that info will be needed as data. These are followed by blocks of
numbers in columns. Each block has an alphanumeric header before it (on its
own line), and there are row numbers.

I would have a fairly good idea how to script this in Matlab, but I don't
want to be the one doing the import on a daily basis, and it's hard for the
researcher to justify buying into some pricey software just to script that
one task.

Any clues about scripting this type of import in Stata would be appreciated.

Thanks

Paul
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

--
E. Paul Wileyto, Ph.D.
Assistant Professor of Biostatistics
Tobacco Use Research Center
School of Medicine, U. of Pennsylvania
3535 Market Street, Suite 4100
Philadelphia, PA 19104-3309

215-746-7147
Fax: 215-746-7140
epw@mail.med.upenn.edu
http://mail.med.upenn.edu/~epw/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index