Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st:How to input a portion of a file


From   "Sergiy Radyakin" <serjradyakin@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st:How to input a portion of a file
Date   Mon, 18 Feb 2008 12:48:38 -0500

On Feb 18, 2008 11:52 AM, Joseph Wagner <joseph.wagner@wright.edu> wrote:
> I can get the file into excel and the columns line up perfectly.  If I
> open the file in Crimson editor the columns appear to be tab-delimited
> after all (apparently why I was able to use -insheet-).   That said, is
> there a user-written program that I have missed that will perform
> -insheet- like action but with options limiting the data?

Hopefully no. it's like having a do file to add two numbers. :)
there is no need in such a command since it can be easily programmed.
all you have to do is

1) open your data file,
2) read the first n lines you want to skip and discard them
3) open a temporary file for writing
4) read the next 1+100 lines from your input file and write them to
the temp file
5) close all files
6) insheet the temp file
7) done

Stata has all the necessary commands for this. if your code becomes
longer than 15 lines, you are probably on the wrong track.

This can be of course implemented as an ado command, but it will be
more trouble to write the help, explain the purpose, parameters,
results, etc. Those small commands actually make a noise, in which
other useful stuff can't be seen.

Regards, Sergiy.


>
>
>
>
>
> Sergiy Radyakin wrote:
> > On Feb 18, 2008 9:26 AM, Joseph Wagner <joseph.wagner@wright.edu> wrote:
> >
> >> I have data I wish to input a portion of into STATA.  Data is collected
> >> on patients by a machine that measures their gait as they walk.  A text
> >> file is output for each patient with columns representing variables
> >> (each about 130 lines long) but the multiple observation data doesn't
> >> start until line 29.  The first 28 lines are taken up with short lines
> >> of data describing the patient.  Unfortunately, I also need a couple of
> >> those lines in 'header' area.  The 29th line has the variables names but
> >> they do not line up directly with the columns of data so I figured I
> >> could just label the data later.  The data I need starts 30 lines down
> >> at column 115 and includes the next 4 columns and goes down 100 lines.
> >>
> >> I realize there are easier ways to do this but I have data on about 300
> >> patients (and so one file for each person) and wanted to automate this
> >> input (followed by successive merging of files to get my final dataset).
> >>
> >> I wanted to use the -infix- command but have never used this command
> >> before and my attempts so far have failed.  I also tried using -infile-
> >> with the _first(30) option and the _line(30) option but those didn't
> >> seem to work either.
> >>
> >> Here is a dictionary I attempted with just one of the variables:
> >>
> >> dictionary using "c:\data\gait\SBS00001_20050607_1.nrm" {
> >>        _line(30)
> >>        _column(115) r_grf_vrt_frc %5f
> >> }
> >>
> >> infile using SBS00001_20050607_1.dct
> >>
> >> unexpected end of file
> >> (5 observations read)
> >>
> >> The other problem is that it didn't seem to pull the data corresponding
> >> to that column.  I thought perhaps there was a problem with the data not
> >> being in a fixed format but if I try -insheet- all the data imports and
> >> the correct data lines up in the individual columns.  Of course I could
> >> write some programming whereby I delete the unneeded variables and line
> >> but that's kind of sloppy.
> >>
> >
> > I am not sure, but this may be an indication that the data is indeed misaligned!
> > Insheet looks for separators, such as commas or tabs, while infile reads data
> > with fixed offsets. The fact that the data was read correctly using
> > insheet, does
> > not mean that it is aligned properly. I suggest you manually delete
> > all the beginning
> > lines that you don't need, and then try to import your data into
> > Excel, there you
> > will also have options fixed/separated, try fixed and see whether it imports ok.
> > Also check if you have tabulation/commas/spaces between the values. It may be
> > the case that you actually need to use insheet.
> >
> > You've mentioned that those few values that were read in are also incorrect?
> > Were they located in the original file to the right from the values
> > you wanted to read in?
> > If so, this may also be an indication of the delimited file format.
> >
> > Try to get the first variable right, then second, and see if there are
> > any problems there.
> > Check, what is so special about line 5 (or 35?) in the file you are reading in.
> >
> > Finally, make sure that you are not trying to read-in the data from
> > the dictionary itself
> > (you have about 5 lines in your dictionary file??) but rather you are
> > using a dictionary file
> > to read in data from the datafile.
> >
> > Regards, Sergiy
> >
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index