Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Joseph Coveney" <jcoveney@bigplanet.com> |
To | <statalist@hsphsun2.harvard.edu> |
Subject | st: Re: loop: how can I download CSV files and append it |
Date | Sun, 20 Mar 2011 11:59:37 +0900 |
Daniel Marcelino wrote: I am wondering if someone could point me the way to get each .CSV file from a website list stored in a column say "sites" and also aggregating it in a master file like "big.data". I perfomed this task using R package, but I prefer do the same work on Stata. So, basically I need to download files which web addresses are in each row of column "sites". After read a file, it need to be appended to the left one until finished the web adresses, so the output big.data.dta must be all raw data found at websites. big.data <- NULL base <-NULL for (i in sites) { try(base <- read.csv2(i, header=T, as.is=T), TRUE) if(!is.null(base)) big.data <- rbind(big.data, base) write.table(big.data, file ="ap.csv", sep=";", row.names=F, na="") } -------------------------------------------------------------------------------- There are a few ways to do this. The one shown below is probably the easiest to see how it works. The illustration assumes that your list of websites is in an ASCII text file called "sites.txt" with one website per row, and with the first row containing the column name, "sites". Change the lines below to fit the way your website list is stored. After reading-in the list of websites, the program stores each website address in its own local macro. The series of local macros are named as numbers: 1, 2, 3, . . . This the purpose of the first loop in the illustration below. Later, in the CSV-file read-in loop (the second loop below), you can refer to each website address by its number, `1', `2', `3', . . . Because this loop's index variable is also a local macro, you de-reference twice (that is, ``i''), first for the loop index, and then for the website storage's local macro. I used the word "site" as the name of the loop's index variable. So the de-referencing is ``site'' in the illustration below. The double-quote marks (that is, "``site''") are in case a website address has a space in it. Joseph Coveney version 11.1 clear * set more off // Read-in your list of Web addresses insheet using sites.txt, names clear // Put each website address into its own local macro `1', `2', . . . local site_total `=_N' forvalues site = 1/`site_total' { local `site' = sites[`site'] } // Load CSV files and append using a temporary file for storing tempfile append quietly forvalues site = `site_total'(-1)1 { insheet using "``site''", names comma clear if `site' == `site_total' { save "`append'" } else { append using "`append'" save "`append'", replace } } // Save the final result outsheet using ap.csv, comma names quote exit * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/