Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: infile and dictionaries and the small data mindset


From   "Friedrich Huebler" <fhuebler@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: infile and dictionaries and the small data mindset
Date   Wed, 14 May 2008 13:56:20 -0400

You can -infix- the data. Keep in mind that strings are limited to a
length of 244 characters.

. infix str data 1-244 using data.txt
. infix str var1 1-244 str var2 245-488 using data.txt

Friedrich


On Wed, May 14, 2008 at 11:35 AM, E. Paul Wileyto
<epw@mail.med.upenn.edu> wrote:
> Thanks.  Both are somewhat useful.
> I think that what I actually need to do is import each line of text as a
> single string, and then parse it line-by-line according to whatever rules I
> can glean from the original data files.  Is that possible?
>
> If it is, I can parse by a set of rules that change according to which block
> headers I have hit up to that point.  Sounds like a pain, but if I can code
> it once, then I can hand it to someone else to do the import.
>
> P
>
> Friedrich Huebler wrote:
>>
>> Paul,
>>
>> Perhaps you can do this with -insheet-, as described in this thread:
>>
>> http://www.stata.com/statalist/archive/2008-02/msg00875.html
>> http://www.stata.com/statalist/archive/2008-02/msg00940.html
>>
>> Friedrich
>>
>> On Wed, May 14, 2008 at 9:23 AM, E. Paul Wileyto <epw@mail.med.upenn.edu>
>> wrote:
>>
>>>
>>> One of our worst fears is that someone will come to us with data
>>> scattered
>>> all over a spreadsheet file in little summary tables.  If they have lots
>>> of
>>> those files, I can usually find a way to script the import efficiently
>>> using
>>> ODBC.
>>>
>>> What if you have those same tables in a text file?  Is there any
>>> efficient
>>> way to import and parse data in such a format?  I have the far end of
>>> this
>>> process scripted so the researcher can generate his own summary
>>> statistics,
>>> but getting the data into Stata involves a program making an excel file,
>>> followed by cutting and pasting into Stata.  I'd like to cut out some of
>>> the
>>> import steps, so that all we would need to do is give a list of filenames
>>> to
>>> a Stata script, and watch the screen roll by as the data get extracted.
>>> The files are generated by a program that is monitoring mouse behavior.
>>>  Each file may contain behavior from one mouse on one day, or several
>>> mice
>>> on one day.  The general format is always the same.  For each mouse-run,
>>> there is a small block of ancillary information as a header.  I cannot
>>> guarantee that all of these blocks have the same number of words, but
>>> some
>>> of that info will be needed as data.  These are followed by blocks of
>>> numbers in columns.  Each block has an alphanumeric header before it (on
>>> its
>>> own line), and there are row numbers.
>>>
>>> I would have a fairly good idea how to script this in Matlab, but I don't
>>> want to be the one doing the import on a daily basis, and it's hard for
>>> the
>>> researcher to justify buying into some pricey software just to script
>>> that
>>> one task.
>>>
>>> Any clues about scripting this type of import in Stata would be
>>> appreciated.
>>>
>>> Thanks
>>>
>>> Paul
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index