Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Append multiple files from .txt file with "file read"


From   Sergiy Radyakin <serjradyakin@gmail.com>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: Append multiple files from .txt file with "file read"
Date   Fri, 6 Dec 2013 18:18:00 -0500

Nicole,

the 'enigma' you mentioned is really very simple. Locals are only
visible within the same 'context' - e.g. within the same program or
do-file. What happens is that when you execute your example from the
do-editor line-by-line, Stata creates temporary do-files each time,
and hence a new context. You can probably see these files' names in
the output window:
. do "C:\Users\nboyle\AppData\Local\Temp\STD1c000000.tmp"
(or something similar)
Hence the second line can't see the results of the first line. When
you execute two lines together, they are in the same context, and
hence the result of the first line is available to the second.

Hope this helps,
Best Sergiy Radyakin

PS: (there is however an undocumented command c_local that allows one
to jump across this boundary,
http://www.stata.com/statalist/archive/2003-12/msg00385.html
use of this command is discouraged)

On Fri, Dec 6, 2013 at 5:47 PM, Nicole Boyle <nicboyle@gmail.com> wrote:
> Thanks to David Radwin, Sergiy Radyakin, David Kantor, and Matt Vivier
> for your very helpful replies!
> I think I've identified one area where I went wrong. When I was
> initially attempting to run my original code yesterday, I was trying
> to run the first few lines "line-by-line" (since I'm not yet confident
> in programming, I wanted to make sure what I wanted to happen was
> _actually_ happening). However, it seems that the error that I
> originally noted below:
> ...
>     ! ls *.dta >filelist.txt
>     file open myfile using "filelist.txt", read
>     file read myfile line
>     use `line'  /* ERROR HERE */
> ...
>
> didn't occur today when executing the code as a single block.
>
> The lesson I'd LIKE to take away from this that local macros can only
> be used within the same block of code from which they're created.
> However, I'm not sure this is truly the case, since something simple
> as this:
>
>     local x="whatever"
>     display "`x'"
>
> CAN, in fact, be run successfully line-by-line.
>
>
> Apart from this enigma, I played around with the codes each of you
> kindly posted and it was extremely helpful. It seems that there are
> multiple ways of accomplishing the same goal, which is great to know.
> I ended up using David Kantor's code and replaced -append- with
> -merge- along with options -nogenerate- and -update-.
>
> ...
>     ! ls *.dta >filelist.txt
>     local jj = 0
>         file open myfile using "filelist.txt", read
>         file read myfile line
>         while ~r(eof) {
>                 if `"`line'"' ~= "" {
>                         disp `"`line'"'
>                         if ~`jj++'  {
>                                 use  `"`line'"'
>                         }
>                         else {
>                                 merge 1:1 id using `"`line'"', nogenerate update
>                         }
>                 }
>                  file read myfile line
>         }
>
>
> Thanks so much for your help and patience!
>
> Best,
> Nicole
>
> On Thu, Dec 5, 2013 at 5:12 PM, Matt Vivier <mvivier@remedypartners.com> wrote:
>> Hi Nicole,
>>
>> If -merge- is what you're trying to do, then you were on the right track
>> with your initial attempt to use loops. This is something I find myself
>> doing more often than I'd like, and typically using a structure like this:
>>
>> drop _all
>> local filelist : dir . files "*.dta"
>> foreach file of local files {
>>     if _N==0{
>>         use `file'
>>     }
>>     else{
>>         merge 1:1 ID using `file'
>>         drop _merge
>>     }
>> }
>>
>> Three things to look out for:
>> 1. Make sure you -drop- _merge each time, or Stata gets very upset very
>> quickly. I'm guilty of this a little too often.
>> 2. After 25 of these, your screen will become a mess. Once you're
>> comfortable with it working correctly you might think about using -qui- to
>> suppress the output, and maybe just show a count of rows that didn't match.
>> 3. If you have variables with the same name (but different values) in the
>> datasets you may find yourself with some unexpected results. You would want
>> to go through and rename the variables in each file if they matter to your
>> end result.
>>
>> Best,
>> Matt Vivier
>> Data Analyst
>> (203) 541-4665
>> Remedy Partners, Inc
>>
>>
>> On Thu, Dec 5, 2013 at 7:49 PM, David Kantor <kantor.d@att.net> wrote:
>>> Hello Nicole,
>>>
>>> You may want to display `line' to see what you are getting.
>>> Put in...
>>>         disp "`line'"
>>> just before
>>>         use `line'
>>>
>>> How many words does it comprise?
>>> You could be failing because there is nothing there, or because there are
>>> multiple words.
>>> If there are multiple words, and the file name is all of `line' (there are
>>> embedded spaces), then you need quotation marks:
>>>         use "`line' "
>>>
>>> If there are embedded quotation marks, then use compound quotation marks
>>>         use `"`line' "'
>>> -- and that is the safest way, in general.
>>>
>>> But if only the first word is the desired filename, then you need to select
>>> that:
>>>         use "`=word("`line'",1)'"
>>>
>>> (Compound quotes may be safer:
>>>         use `"`=word(`"`line'"',1)'"'
>>> )
>>>
>>> Possibly this is an important consideration; you construct the file using -!
>>> ls-. Does that write information other that the names?
>>> (You are presumably on Unix; I don't recall exactly what you get from -ls-.)
>>>
>>>
>>> If there are blank lines in the file, you may want a filter to skip them:
>>>
>>>         file open myfile using "filelist.txt", read
>>>         file read myfile line
>>>         while ~r(eof) &  `"`line'"' == "" {
>>>                 file read myfile line
>>>         }
>>>         if `"`line'"' ~= "" {
>>>                 disp `"`line'"'
>>>                 use  `"`line'"'
>>>                 file read myfile line
>>>         }
>>>         while ~r(eof) {
>>>
>>>                 append using `"`line'"'
>>>                 file read myfile line
>>>         }
>>>
>>> I might write it a bit differently; this may be simpler:
>>>
>>>         local jj = 0
>>>
>>>         file open myfile using "filelist.txt", read
>>>         file read myfile line
>>>         while ~r(eof) {
>>>                 if `"`line'"' ~= "" {
>>>                         disp `"`line'"'
>>>                         if ~`jj++'  {
>>>                                 use  `"`line'"'
>>>                         }
>>>                         else {
>>>
>>>                                 append using `"`line'"'
>>>                         }
>>>                 }
>>>                 file read myfile line
>>>         }
>>>
>>> That is, the -use- or -append- both appear inside the loop; -use- occurs on
>>> the first pass, -append- on all subsequent passes.
>>>
>>> Again, pay attention to what is in `line'; you may want only part of it. The
>>> code above presumes you want all of `line' as the filename; you will need to
>>> modify it if you need only part.
>>>
>>> As for why your test loop displays the second but not the first line, I
>>> cannot say. (I've heard of failing to get the final line, but you don't seem
>>> to have that problem.)
>>>
>>> Note that your first -save master_data- is unnecessary.
>>> HTH
>>> --David
>>>
>>>
>>>
>>> At 06:30 PM 12/5/2013, you wrote:
>>>>
>>>> Hello all,
>>>>
>>>> First and foremost, I have yet to fully understand how to use macros,
>>>> so please forgive me if the solution to this problem is painfully
>>>> obvious. I actually hope it's painfully obvious.
>>>>
>>>> I'm trying to combine multiple .dta files (1:1 horizontally appended)
>>>> by calling several .dta filenames stored in a .txt file. However, in
>>>> the process of doing this, whenever I try to run:
>>>>
>>>> .    use `line'
>>>>
>>>> Stata returns the error:
>>>>
>>>> .    invalid file specification
>>>>
>>>>
>>>> Here's the code I'm trying to execute (sourced from here*). To start,
>>>> I'm trying to execute this code on a .txt file containing just two
>>>> lines (aka: two .dta filenames), but the final file will have 25
>>>> lines:
>>>>
>>>>    pwd
>>>>    cd ~/Desktop/merge
>>>>    ! ls *.dta >filelist.txt
>>>>    file open myfile using "filelist.txt", read
>>>>    file read myfile line
>>>>    use `line'  /* ERROR HERE */
>>>>    save master_data, replace
>>>>    file read myfile line
>>>>    while r(eof)==0 {
>>>>    append using `line'
>>>>    file read myfile line
>>>>    }
>>>>    file close myfile
>>>>    save master_data, replace
>>>>
>>>>
>>>> I first thought the problem was that "filelist.txt" wasn't being read.
>>>> However, I believe it IS being read, since running the following:
>>>>
>>>>    ! ls *.dta >filelist.txt
>>>>    file open myfile using "filelist.txt", read
>>>>    file read myfile line
>>>>    while r(eof)==0 {
>>>>    display "`=word("`line'",1)'"
>>>>     file read myfile line
>>>>     }
>>>>
>>>> only displays the second (but not the first) line of the two-line .txt
>>>> file.
>>>>
>>>> Perhaps my issue has something to do with Stata overlooking the first
>>>> line of the .txt file? Or perhaps my general macro-incompetence (more
>>>> likely)?
>>>>
>>>> Any help will be greatly appreciated. Thanks so much for your
>>>> consideration.
>>>>
>>>> Nicole
>>>
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>>
>> --
>> The information contained in this transmission and any attachments may be
>> confidential, proprietary  or privileged, and may be subject to protection
>> under applicable law. This transmission is intended for the sole use of the
>> individual or  entity to whom it is addressed. If you think you have
>> received this transmission in error, please alert
>> compliance@remedypartners.com and then delete this e-mail immediately.
>> Thank you.
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index