[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Download and parse html files (and regex trouble)

From   "Gabi Huiber" <>
Subject   Re: st: Download and parse html files (and regex trouble)
Date   Thu, 3 Apr 2008 03:06:28 -0400

In an earlier response of mine to this post I blamed the ...
(dot-dot-dot) special character for breaking my file read code. That
was not the reason.

The command file read `fh' line chokes on do-file lines where a
comment is inserted before the end of the line with the double forward
slash syntax. I have no idea how to make that go away. I tried
enclosing my file read/file write routine within this if-condition:

if !regexm("macval(`line')","[[a-zA-Z0-9][:punct:]]*\/\/"){
read line in this file
write line in that file

But that had no effect.


On Thu, Apr 3, 2008 at 12:20 AM, Sebastian Bauhoff <> wrote:
> Dear Statalisters,
> I need to download a large number of html files from the internet and parse
> their content.  The structure of the html pages is always the same, and I
> need to extract only a small part that is identified within the html code.
> I would like to use Stata to download the files, extract the information I
> want, and save the result in a dataset.  Any suggestions or pointers much
> appreciated.
> Thanks,
> Sebastian
> *
> *   For searches and help try:
> *
> *
> *
*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index