Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Append multiple files from .txt file with "file read"


From   Nicole Boyle <[email protected]>
To   [email protected]
Subject   Re: st: Append multiple files from .txt file with "file read"
Date   Fri, 6 Dec 2013 14:47:37 -0800

Thanks to David Radwin, Sergiy Radyakin, David Kantor, and Matt Vivier
for your very helpful replies!
I think I've identified one area where I went wrong. When I was
initially attempting to run my original code yesterday, I was trying
to run the first few lines "line-by-line" (since I'm not yet confident
in programming, I wanted to make sure what I wanted to happen was
_actually_ happening). However, it seems that the error that I
originally noted below:
...
    ! ls *.dta >filelist.txt
    file open myfile using "filelist.txt", read
    file read myfile line
    use `line'  /* ERROR HERE */
...

didn't occur today when executing the code as a single block.

The lesson I'd LIKE to take away from this that local macros can only
be used within the same block of code from which they're created.
However, I'm not sure this is truly the case, since something simple
as this:

    local x="whatever"
    display "`x'"

CAN, in fact, be run successfully line-by-line.


Apart from this enigma, I played around with the codes each of you
kindly posted and it was extremely helpful. It seems that there are
multiple ways of accomplishing the same goal, which is great to know.
I ended up using David Kantor's code and replaced -append- with
-merge- along with options -nogenerate- and -update-.

...
    ! ls *.dta >filelist.txt
    local jj = 0
        file open myfile using "filelist.txt", read
        file read myfile line
        while ~r(eof) {
                if `"`line'"' ~= "" {
                        disp `"`line'"'
                        if ~`jj++'  {
                                use  `"`line'"'
                        }
                        else {
                                merge 1:1 id using `"`line'"', nogenerate update
                        }
                }
                 file read myfile line
        }


Thanks so much for your help and patience!

Best,
Nicole

On Thu, Dec 5, 2013 at 5:12 PM, Matt Vivier <[email protected]> wrote:
> Hi Nicole,
>
> If -merge- is what you're trying to do, then you were on the right track
> with your initial attempt to use loops. This is something I find myself
> doing more often than I'd like, and typically using a structure like this:
>
> drop _all
> local filelist : dir . files "*.dta"
> foreach file of local files {
>     if _N==0{
>         use `file'
>     }
>     else{
>         merge 1:1 ID using `file'
>         drop _merge
>     }
> }
>
> Three things to look out for:
> 1. Make sure you -drop- _merge each time, or Stata gets very upset very
> quickly. I'm guilty of this a little too often.
> 2. After 25 of these, your screen will become a mess. Once you're
> comfortable with it working correctly you might think about using -qui- to
> suppress the output, and maybe just show a count of rows that didn't match.
> 3. If you have variables with the same name (but different values) in the
> datasets you may find yourself with some unexpected results. You would want
> to go through and rename the variables in each file if they matter to your
> end result.
>
> Best,
> Matt Vivier
> Data Analyst
> (203) 541-4665
> Remedy Partners, Inc
>
>
> On Thu, Dec 5, 2013 at 7:49 PM, David Kantor <[email protected]> wrote:
>> Hello Nicole,
>>
>> You may want to display `line' to see what you are getting.
>> Put in...
>>         disp "`line'"
>> just before
>>         use `line'
>>
>> How many words does it comprise?
>> You could be failing because there is nothing there, or because there are
>> multiple words.
>> If there are multiple words, and the file name is all of `line' (there are
>> embedded spaces), then you need quotation marks:
>>         use "`line' "
>>
>> If there are embedded quotation marks, then use compound quotation marks
>>         use `"`line' "'
>> -- and that is the safest way, in general.
>>
>> But if only the first word is the desired filename, then you need to select
>> that:
>>         use "`=word("`line'",1)'"
>>
>> (Compound quotes may be safer:
>>         use `"`=word(`"`line'"',1)'"'
>> )
>>
>> Possibly this is an important consideration; you construct the file using -!
>> ls-. Does that write information other that the names?
>> (You are presumably on Unix; I don't recall exactly what you get from -ls-.)
>>
>>
>> If there are blank lines in the file, you may want a filter to skip them:
>>
>>         file open myfile using "filelist.txt", read
>>         file read myfile line
>>         while ~r(eof) &  `"`line'"' == "" {
>>                 file read myfile line
>>         }
>>         if `"`line'"' ~= "" {
>>                 disp `"`line'"'
>>                 use  `"`line'"'
>>                 file read myfile line
>>         }
>>         while ~r(eof) {
>>
>>                 append using `"`line'"'
>>                 file read myfile line
>>         }
>>
>> I might write it a bit differently; this may be simpler:
>>
>>         local jj = 0
>>
>>         file open myfile using "filelist.txt", read
>>         file read myfile line
>>         while ~r(eof) {
>>                 if `"`line'"' ~= "" {
>>                         disp `"`line'"'
>>                         if ~`jj++'  {
>>                                 use  `"`line'"'
>>                         }
>>                         else {
>>
>>                                 append using `"`line'"'
>>                         }
>>                 }
>>                 file read myfile line
>>         }
>>
>> That is, the -use- or -append- both appear inside the loop; -use- occurs on
>> the first pass, -append- on all subsequent passes.
>>
>> Again, pay attention to what is in `line'; you may want only part of it. The
>> code above presumes you want all of `line' as the filename; you will need to
>> modify it if you need only part.
>>
>> As for why your test loop displays the second but not the first line, I
>> cannot say. (I've heard of failing to get the final line, but you don't seem
>> to have that problem.)
>>
>> Note that your first -save master_data- is unnecessary.
>> HTH
>> --David
>>
>>
>>
>> At 06:30 PM 12/5/2013, you wrote:
>>>
>>> Hello all,
>>>
>>> First and foremost, I have yet to fully understand how to use macros,
>>> so please forgive me if the solution to this problem is painfully
>>> obvious. I actually hope it's painfully obvious.
>>>
>>> I'm trying to combine multiple .dta files (1:1 horizontally appended)
>>> by calling several .dta filenames stored in a .txt file. However, in
>>> the process of doing this, whenever I try to run:
>>>
>>> .    use `line'
>>>
>>> Stata returns the error:
>>>
>>> .    invalid file specification
>>>
>>>
>>> Here's the code I'm trying to execute (sourced from here*). To start,
>>> I'm trying to execute this code on a .txt file containing just two
>>> lines (aka: two .dta filenames), but the final file will have 25
>>> lines:
>>>
>>>    pwd
>>>    cd ~/Desktop/merge
>>>    ! ls *.dta >filelist.txt
>>>    file open myfile using "filelist.txt", read
>>>    file read myfile line
>>>    use `line'  /* ERROR HERE */
>>>    save master_data, replace
>>>    file read myfile line
>>>    while r(eof)==0 {
>>>    append using `line'
>>>    file read myfile line
>>>    }
>>>    file close myfile
>>>    save master_data, replace
>>>
>>>
>>> I first thought the problem was that "filelist.txt" wasn't being read.
>>> However, I believe it IS being read, since running the following:
>>>
>>>    ! ls *.dta >filelist.txt
>>>    file open myfile using "filelist.txt", read
>>>    file read myfile line
>>>    while r(eof)==0 {
>>>    display "`=word("`line'",1)'"
>>>     file read myfile line
>>>     }
>>>
>>> only displays the second (but not the first) line of the two-line .txt
>>> file.
>>>
>>> Perhaps my issue has something to do with Stata overlooking the first
>>> line of the .txt file? Or perhaps my general macro-incompetence (more
>>> likely)?
>>>
>>> Any help will be greatly appreciated. Thanks so much for your
>>> consideration.
>>>
>>> Nicole
>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>
> --
> The information contained in this transmission and any attachments may be
> confidential, proprietary  or privileged, and may be subject to protection
> under applicable law. This transmission is intended for the sole use of the
> individual or  entity to whom it is addressed. If you think you have
> received this transmission in error, please alert
> [email protected] and then delete this e-mail immediately.
> Thank you.
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index