Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: collecting raw data from the web via browser automation

From   Kit Baum <>
Subject   st: collecting raw data from the web via browser automation
Date   Mon, 22 May 2006 18:52:00 -0400

Austin said

The trouble is this: the link to bibliographic data is not a static
page; it is generated on the fly, so Stata cannot -copy- to a local
file to -infile- the info. I will need a browser to browse to that
location, and then save the results. Does anyone have a freeware
solution to this problem? I have access to several varieties of
Windows and Unix/Linux, but no Mac OS options. What I am thinking is
that if there is a command line browser with the option to save the
page to disk, I can just invoke the page and save it with a single
line of code that begins with the -shell- command, and then infile it
with another that begins -infile-.

One thing to remember: if you can do it in Unix/Linux, you can always do it in Mac OS X, which is after all Unix with a Mac face.

On Mac OS X, either wget or curl will do what you want. I.e.

curl statalist.0605/Date/article-780.html > austin.html

Perl is an excellent tool to grab web pages and turn them into text files (perhaps after stripping html tags). See a number of the scripts I have written in RePEc under software->RePEc team for examples (one, for instance, snarfs the AEA's XML data for the A.E.R. and turns it into RePEc templates).

Kit Baum, Boston College Economics

* For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index