Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Re: collecting raw data from the web via browser automation


From   Kit Baum <baum@bc.edu>
To   statalist@hsphsun2.harvard.edu
Subject   st: Re: collecting raw data from the web via browser automation
Date   Tue, 23 May 2006 07:04:43 -0400

As a later post indicates, you can use Perl's LWP module for this, or as Phil suggests, Python. But when it comes down to it Michael's suggestion below is far more useful:

--cut here--
capt program drop _all
program goograb,rclass
syntax ,Name(string)
local name : subinstr local name " " "+",all
local url "http://scholar.google.com/scholar? q=`name'&ie=UTF-8&oe=UTF-8&hl=en&btnG=Search"
copy "`url'" test.html, text replace
end
-- cut here--

goograb, name(blasnik michael)

returns test.html (hardcoded out of laziness; could use a tempfile and then use file commands to snarf it and work with the contents).
Give -goograb- any other name and it will look for their stuff in Google Scholar.

Kit Baum, Boston College Economics
http://ideas.repec.org/e/pba1.html


On May 23, 2006, at 2:33 AM, Michael wrote:


I'm not sure if any of these tools can actually solve the problem originally
posted.

The example Kit gives shows accessing a static web page -- a page that
already exists "as is" and one you could also simply copy to your local
drive using Stata itself (copy http:/.../...) and then parse it as needed.
It's easy to download that data directly to Stata and I don't think that is
the problem.

I think what the original post asked for (and what I would be interested in
as well) is a way to access web pages that are only created when an action
is taken or selection is made on a different web page, so there is no
specific web address that holds the data you want. I have thought about
trying to use auto-it or another scripting language to launch a browser,
make selections on a web page and then capture the data that's spawned
typically in a new window.

Do any of the tools mentioned by Kit or Phil actually do this?
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index