Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Lucas Ferreira Mation <lucasmation@gmail.com> |

To |
statalist <statalist@hsphsun2.harvard.edu> |

Subject |
st: import html , what is the proper way? |

Date |
Wed, 5 Feb 2014 10:38:27 -0200 |

Helo, I'm trying to import data from the web page. From previous post, I saw there are two ways to import from html, "insheet" or "infile" (sometimes preceded by "copy" > "filefilter" to filter breaks and unwanted html tags). I tryed both ways: . version 12.1 // stata12.1 running on a windows 7 machine . global url http://www.ipea.gov.br/portal/index.php?option=com_content&view=article&id=16643&catid=117&Itemid=5 . insheet using "$url", clear . infile str244 text using "$url", clear Neither really works: infile : imported file is all corrupt, it seems that every space as interpreted as a line break. Can I solve this with filefilter? insheet: line breaks seem to be fairly ok (although not perfect in all cases), but some rows were split into different columns ( I suppose the lines that had a "," in them). Is there a "never occurring delimiter" that I could use so the variables are never split? More generally, is there a way to import from HTML so that the imported file looks just like what the source code I see in the browser? tks Lucas * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: import html , what is the proper way?***From:*Nick Cox <njcoxstata@gmail.com>

- Prev by Date:
**RE: st: Formatting a string variable (UK postcodes) to always be seven characters in length** - Next by Date:
**Re: st: collapsing in a way to display the different values for a variable in a row** - Previous by thread:
**st: collapsing in a way to display the different values for a variable in a row** - Next by thread:
**Re: st: import html , what is the proper way?** - Index(es):