[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: Re: collecting raw data from the web via browser automation
As a later post indicates, you can use Perl's LWP module for this, or
as Phil suggests, Python. But when it comes down to it Michael's
suggestion below is far more useful:
capt program drop _all
local name : subinstr local name " " "+",all
local url "http://scholar.google.com/scholar?
copy "`url'" test.html, text replace
-- cut here--
goograb, name(blasnik michael)
returns test.html (hardcoded out of laziness; could use a tempfile
and then use file commands to snarf it and work with the contents).
Give -goograb- any other name and it will look for their stuff in
Kit Baum, Boston College Economics
On May 23, 2006, at 2:33 AM, Michael wrote:
I'm not sure if any of these tools can actually solve the problem
The example Kit gives shows accessing a static web page -- a page that
already exists "as is" and one you could also simply copy to your
drive using Stata itself (copy http:/.../...) and then parse it as
It's easy to download that data directly to Stata and I don't think
I think what the original post asked for (and what I would be
as well) is a way to access web pages that are only created when an
is taken or selection is made on a different web page, so there is no
specific web address that holds the data you want. I have thought
trying to use auto-it or another scripting language to launch a
make selections on a web page and then capture the data that's spawned
typically in a new window.
Do any of the tools mentioned by Kit or Phil actually do this?
* For searches and help try: