Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <njcoxstata@gmail.com> |

To |
"statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |

Subject |
Re: st: import html , what is the proper way? |

Date |
Wed, 5 Feb 2014 13:07:34 +0000 |

I don't think there can be a single proper way to import HTML files, as HTML is a mark-up language, not a file format defining a Stata-compatible data file. In the example you give there is just a list of projects. Is that the data? If it is copy-and-paste from what you see in the browser into Stata's editor gives a good start, after which you just -drop- unwanted lines. I don't see that you want to import the mark-up at all. Nick njcoxstata@gmail.com On 5 February 2014 12:38, Lucas Ferreira Mation <lucasmation@gmail.com> wrote: > Helo, > > I'm trying to import data from the web page. From previous post, I saw > there are two ways to import from html, "insheet" or "infile" > (sometimes preceded by "copy" > "filefilter" to filter breaks and > unwanted html tags). I tryed both ways: > > . version 12.1 // stata12.1 running on a windows 7 machine > . global url http://www.ipea.gov.br/portal/index.php?option=com_content&view=article&id=16643&catid=117&Itemid=5 > . insheet using "$url", clear > . infile str244 text using "$url", clear > > Neither really works: > > infile : imported file is all corrupt, it seems that every space as > interpreted as a line break. Can I solve this with filefilter? > > insheet: line breaks seem to be fairly ok (although not perfect in all > cases), but some rows were split into different columns ( I suppose > the lines that had a "," in them). Is there a "never occurring > delimiter" that I could use so the variables are never split? > > More generally, is there a way to import from HTML so that the > imported file looks just like what the source code I see in the > browser? > > tks > Lucas > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: import html , what is the proper way?***From:*Lucas Ferreira Mation <lucasmation@gmail.com>

**References**:**st: import html , what is the proper way?***From:*Lucas Ferreira Mation <lucasmation@gmail.com>

- Prev by Date:
**Re: st: collapsing in a way to display the different values for a variable in a row** - Next by Date:
**Re: st: import html , what is the proper way?** - Previous by thread:
**st: import html , what is the proper way?** - Next by thread:
**Re: st: import html , what is the proper way?** - Index(es):