Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: reading txt-file without end-of-line delimiter and uneven record length


From   "Steichen, Thomas J." <SteichT@RJRT.com>
To   "'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu>
Subject   RE: st: reading txt-file without end-of-line delimiter and uneven record length
Date   Thu, 20 Nov 2008 11:43:53 -0500

Here is a "fancier" bit of code that stores the data in a stata dataset:

*---start of code ------------------------------------------------------
program read_longfile
  version 10.1
  gettoken mname 0 : 0
  syntax using/

* open input file
  tempname hdl
  file open `hdl' using `"`using'"', read binary

* set up empty variables (v00, v0, v1-v151) with 100 obs each
* or add another 100 obs to a non-empty file
  local nb = _N
  local n = `nb' + 100
  cap set obs `n'

  cap qui gen double v00 = .
  format v00 %11.0f
  cap qui gen v0 = .
  format v0 %6.0f
  forvalues j = 1(1)151 {
      cap qui gen double v`j' = .
  }
  format v1-v151 %13.5f

* get first record (as string) and store in v00
  file read `hdl' %11s val_11 %1964s junk
  local n1 = `nb' + 1
  qui replace v00 = real("`val_11'") in `n1'/`n'

* loop over 100 records
  forvalues i = 1(1)100 {
        local ir = `nb' + `i'

*   get 6-byte var (as string) and store in v0
    file read `hdl' %6s val_6
    qui replace v0 = real("`val_6'") in `ir'

*   loop over 151 13-byte vars (as strings) and store in v1-v151
    forvalues j = 1(1)151 {
      file read `hdl' %13s val_135
      qui replace v`j' = real("`val_135'") in `ir'
    }
  }

* close input file
  file close `hdl'

  end

* call via: read_longfile, using "filename"
*---end of code --------------------------------------------------------



-----------------------------------
Thomas J. Steichen
steicht@rjrt.com
-----------------------------------

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Even Bergseng
Sent: Wednesday, November 19, 2008 3:38 PM
To: statalist@hsphsun2.harvard.edu
Subject: SV: st: reading txt-file without end-of-line delimiter and uneven record length

Option 1) is no-go, as I am looping this over many files. I first write a file with control parameters for an external program, then run the external program and finally read back the results. I will eventually do it for hundreds and thousands of files at the time. By hand is thus not an option.

The file structure is fixed and known. All records are of equal length (1 %-6f and 151 %13.5f variables over 1969 bytes) except for the first (which holds one %11f variable and is otherwise empty for 1963 bytes).

Could you give a short example of option 2? My knowledge of binary files and also the -file- command is limited, to say the least.

thanks,
Even


________________________________________
Fra: owner-statalist@hsphsun2.harvard.edu [owner-statalist@hsphsun2.harvard.edu] p&#229; vegne av Sergiy Radyakin [serjradyakin@gmail.com]
Sendt: 19. november 2008 21:17
Til: statalist@hsphsun2.harvard.edu
Emne: Re: st: reading txt-file without end-of-line delimiter and uneven record length

1) open the file in a text editor, remove the extra 6 bytes, import
with constant record length, restore the removed 6 bytes manually

2) use binary read/write commands, read byte by byte in a double-loop
(1 by records 1..101, second by characters 1..record_length)

How do you know that it is the first record that is longer, not, say, 25th?

Regards,
   Sergiy Radyakin


On Wed, Nov 19, 2008 at 2:59 PM, Even Bergseng <even.bergseng@umb.no> wrote:
> Dear listers!
>
> I have a txt-file without end-of-line delimiter and uneven record length that I want to read into Stata. The lack of end-of-lin delimiters and hence all observations on one line, suggests using the _lrecl option of the -infile2- command. The uneven record length suggests otherwise.
>
> There is only one uneven record (1975 bytes) which occurrs at the beginning of the file. All other records are 1969 bytes. There are 100 records excluding the first.
>
> I have tried to use -file read- and then -file write- to get rid of the first uneven record to be able to use -infile-, but the total line length and thus the macro from -file read- makes Stata state "too few quotes" when I try to write it with -file write-.
>
> My code for the -file- command is as follows:
>
> ****
> tempname OUT1
> file open `OUT1' using "$sima\OUT1.dat", read write text
> file seek `OUT1' 1975
> file read `OUT1' line
> file close `OUT1'
> tempname RESULT
> file open `RESULT' using "$sima\RESULT.dat", read write text
> file write `RESULT' (`"`macval(line)'"')
> file close `RESULT'
> ***
>
> Any hints on how th read the txt-file?
>
> best regards,
> Even Bergseng
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

CONFIDENTIALITY  NOTE:  This e-mail message, including any attachment(s), contains information that may be confidential, protected by the attorney-client or other legal privileges, and/or proprietary non-public information. If you are not an intended recipient of this message or an authorized assistant to an intended recipient, please notify the sender by replying to this message and then delete it from your system. Use, dissemination, distribution, or reproduction of this message and/or any of its attachments (if any) by unintended recipients is not authorized and may be unlawful.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index