Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Re: loop: how can I download CSV files and append it


From   "Joseph Coveney" <[email protected]>
To   <[email protected]>
Subject   st: Re: loop: how can I download CSV files and append it
Date   Sun, 20 Mar 2011 11:59:37 +0900

Daniel Marcelino wrote:

I am wondering if someone could point me the way to get each .CSV file
from a website list stored in a column say "sites" and also
aggregating it in a master file like "big.data".
I perfomed this task using R package, but I prefer do the same work on Stata.

So, basically I need to download files which web addresses are in each
row of column "sites". After read a file, it need to be appended to
the left one until finished the web adresses, so the output
big.data.dta must be all raw data found at websites.

big.data <- NULL
base <-NULL

for (i in sites) {
try(base <- read.csv2(i, header=T, as.is=T), TRUE)
if(!is.null(base)) big.data <- rbind(big.data, base)
write.table(big.data, file ="ap.csv",
sep=";", row.names=F, na="")
}

--------------------------------------------------------------------------------

There are a few ways to do this. The one shown below is probably the easiest to
see how it works.  The illustration assumes that your list of websites is in an
ASCII text file called "sites.txt" with one website per row, and with the first
row containing the column name, "sites".  Change the lines below to fit the way
your website list is stored.

After reading-in the list of websites, the program stores each website address
in its own local macro.  The series of local macros are named as numbers: 1, 2,
3, . . .  This the purpose of the first loop in the illustration below.

Later, in the CSV-file read-in loop (the second loop below), you can refer to 
each website address by its number, `1', `2', `3', . . .  Because this loop's 
index variable is also a local macro, you de-reference twice (that is, ``i''),
first for the loop index, and then for the website storage's local macro.  I 
used the word "site" as the name of the loop's index variable.  So the 
de-referencing is ``site'' in the illustration below.  The double-quote marks 
(that is, "``site''") are in case a website address has a space in it.

Joseph Coveney

version 11.1

clear *
set more off

// Read-in your list of Web addresses
insheet using sites.txt, names clear

// Put each website address into its own local macro `1', `2', . . .
local site_total `=_N'
forvalues site = 1/`site_total' {
    local `site' = sites[`site']
}

// Load CSV files and append using a temporary file for storing
tempfile append
quietly forvalues site = `site_total'(-1)1 {
    insheet using "``site''", names comma clear
    if `site' == `site_total' {
        save "`append'"
    }
    else {
        append using "`append'"
        save "`append'", replace
    }
}

// Save the final result
outsheet using ap.csv, comma names quote
exit


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index