| 
    
 |   | 
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: Re: collecting raw data from the web via browser automation
As a later post indicates, you can use Perl's LWP module for this, or  
as Phil suggests, Python. But when it comes down to it Michael's  
suggestion below is far more useful:
--cut here--
capt program drop _all
program goograb,rclass
syntax ,Name(string)
local name : subinstr local name " " "+",all
local url "http://scholar.google.com/scholar? 
q=`name'&ie=UTF-8&oe=UTF-8&hl=en&btnG=Search"
copy "`url'" test.html, text replace
end
-- cut here--
goograb, name(blasnik michael)
returns test.html (hardcoded out of laziness; could use a tempfile  
and then use file commands to snarf it and work with the contents).
Give -goograb- any other name and it will look for their stuff in  
Google Scholar.
Kit Baum, Boston College Economics
http://ideas.repec.org/e/pba1.html
On May 23, 2006, at 2:33 AM, Michael wrote:
I'm not sure if any of these tools can actually solve the problem  
originally
posted.
The example Kit gives shows accessing a static web page -- a page that
already exists "as is" and one you could also simply copy to your  
local
drive using Stata itself (copy http:/.../...) and then parse it as  
needed.
It's easy to download that data directly to Stata and I don't think  
that is
the problem.
I think what the original post asked for (and what I would be  
interested in
as well) is a way to access web pages that are only created when an  
action
is taken or selection is made on a different web page, so there is no
specific web address that holds the data you want.  I have thought  
about
trying to use auto-it or another scripting language to launch a  
browser,
make selections on a web page and then capture the data that's spawned
typically in a new window.
Do any of the tools mentioned by Kit or Phil actually do this?
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/