[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Joseph Wagner <joseph.wagner@wright.edu> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st:How to input a portion of a file |

Date |
Thu, 21 Feb 2008 11:23:21 -0500 |

As Friedrich pointed out, Stata does import the columnar data quite well and deleting the first lines of the 'header' region will work - to a point. I do have a single line of data that I need from the header area. I would be happy to send the ascii text file as well as the excel spreadsheet that has the needed variables highlighted if anyone is interested in getting a better look.

Sergiy Radyakin wrote:

Stata does not "guess". Stata determines the variable types using a--

"scientific method of looking". Stata looks not "at the early bit of

the file" - it looks at the whole file. This is called "the first

pass". After the variable types are determined - the file can be read

in - that is called "the second pass". Users of StatTransfer will be

familiar with this technique - StatTransfer will do two passes over

your data and is very explicit at showing it's progress.

It remains unclear, however, what the third pass in the Stata's

-insheet- procedure is for. It could be a simple ineffeciency of code,

or it could be something else, which I don't see at the moment, which

necessitates the third pass (and this is more probable, since even the

most recent version does so).

The fact is however, that Stata will fully read the file 3 times when

importing from text format. If the file is already in dta format, one

pass is enough, and here Stata is very fast.

Best regards,

Sergiy Radyakin

On 2/21/08, Nick Cox <n.j.cox@durham.ac.uk> wrote:

That's an instructive example.*

As I understand it, -insheet- peeks at the early bit of the file, makes

a guess at the number and type of variables, and assigns accordingly.

Whether guessing will also reliably give a workable answer with Joseph

Wagner's files, I can't say.

Nick

n.j.cox@durham.ac.uk

Friedrich Huebler

Assume we have a file "test.txt" that contains the following text

(without the Start and End lines). We are only interested in the

numbers.

=== Start of file ===

I am not clear how that this will help, as the header text and

the remainder of the file will give -insheet- quite different

ideas about what variables there are.

mpg trunk turn

22 11 40

17 11 40

22 12 35

20 16 40

=== End of file ===

Let's import the data with -insheet-.

. insheet using test.txt, nonames delimiter(" ")

(14 vars, 8 obs)

. drop if _n < 5

(4 observations deleted)

. drop v4 - v14

. list

+--------------+

| v1 v2 v3 |

|--------------|

1. | 22 11 40 |

2. | 17 11 40 |

3. | 22 12 35 |

4. | 20 16 40 |

+--------------+

Friedrich

On Wed, Feb 20, 2008 at 6:35 AM, Nick Cox <n.j.cox@durham.ac.uk> wrote:

I am not clear how that this will help, as the header text and thewrote:

remainder of the file will give -insheet- quite different ideas about

what variables there are.

Nick

n.j.cox@durham.ac.uk

Friedrich Huebler

You wrote that -insheet- with subsequent deletion of unwanted data is

"sloppy". That approach might still be the easiest if all files have

the same structure and your data always appear in the same columns.

. insheet using filename, nonames

. drop if _n < 30 | _n > 129

. drop v1 - v20 v25 - v30

On Feb 18, 2008 9:26 AM, Joseph Wagner <joseph.wagner@wright.edu>

> I have data I wish to input a portion of into STATA. Data isdoesn't

collected

> on patients by a machine that measures their gait as they walk. A

text

> file is output for each patient with columns representing variables

> (each about 130 lines long) but the multiple observation data

> start until line 29. The first 28 lines are taken up with shortlines

> of data describing the patient. Unfortunately, I also need acouple

ofnames

> those lines in 'header' area. The 29th line has the variables

butI

> they do not line up directly with the columns of data so I figured

> could just label the data later. The data I need starts 30 linesdown

> at column 115 and includes the next 4 columns and goes down 100lines.

>this

> I realize there are easier ways to do this but I have data on about

300

> patients (and so one file for each person) and wanted to automate

> input (followed by successive merging of files to get my finalcommand

dataset).

>

> I wanted to use the -infix- command but have never used this

> before and my attempts so far have failed. I also tried usingdidn't

-infile-

> with the _first(30) option and the _line(30) option but those

> seem to work either.data

>

> Here is a dictionary I attempted with just one of the variables:

>

> dictionary using "c:\data\gait\SBS00001_20050607_1.nrm" {

> _line(30)

> _column(115) r_grf_vrt_frc %5f

> }

>

> infile using SBS00001_20050607_1.dct

>

> unexpected end of file

> (5 observations read)

>

> The other problem is that it didn't seem to pull the data

corresponding

> to that column. I thought perhaps there was a problem with the

not*

> being in a fixed format but if I try -insheet- all the data imports

and

> the correct data lines up in the individual columns. Of course I

could

> write some programming whereby I delete the unneeded variables and

line

> but that's kind of sloppy.

>

>

>

> I am using STATA ver. 8.2

* For searches and help try:

* http://www.stata.com/support/faqs/res/findit.html

* http://www.stata.com/support/statalist/faq

* http://www.ats.ucla.edu/stat/stata/

* For searches and help try:

* http://www.stata.com/support/faqs/res/findit.html

* http://www.stata.com/support/statalist/faq

* http://www.ats.ucla.edu/stat/stata/

Joseph H. Wagner, M.P.H.

Lifespan Health Research Center

Wright State University Boonshoft School of Medicine

3171 Research Blvd.

Kettering, OH 45420-4014

(937) 775-1494 (LHRC office)

(937) 775-1456 (fax)

joseph.wagner@wright.edu

Visit the Lifespan Health Research Center Home Page at:

http://www.med.wright.edu/lhrc

*

* For searches and help try:

* http://www.stata.com/support/faqs/res/findit.html

* http://www.stata.com/support/statalist/faq

* http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st:How to input a portion of a file***From:*"Friedrich Huebler" <fhuebler@gmail.com>

**References**:**st:How to input a portion of a file***From:*Joseph Wagner <joseph.wagner@wright.edu>

**Re: st:How to input a portion of a file***From:*"Friedrich Huebler" <fhuebler@gmail.com>

**RE: st:How to input a portion of a file***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

**Re: st:How to input a portion of a file***From:*"Friedrich Huebler" <fhuebler@gmail.com>

**RE: st:How to input a portion of a file***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

**Re: st:How to input a portion of a file***From:*"Sergiy Radyakin" <serjradyakin@gmail.com>

- Prev by Date:
**Re: st: eivreg and cluster** - Next by Date:
**RE: st: Poisson Number Generation** - Previous by thread:
**Re: st:How to input a portion of a file** - Next by thread:
**Re: st:How to input a portion of a file** - Index(es):

© Copyright 1996–2017 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |