Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: reading HTML source in Chinese but get a messy code


From   "Li Chuntao (Tony)" <leechtcn@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   st: reading HTML source in Chinese but get a messy code
Date   Thu, 6 Jun 2013 21:36:21 +0800

Dear Listers,

       I want to import the following HTML source files:

        http://qq.ico.la/qq459322466.html

        The source file contains some information in Chinese, which is
located in line 32 to 73.

         i tried to import the information by using the following code:

clear all
set obs 500
copy  "http://qq.ico.la/qq459322466.html"; d:\qq.txt, replace

mata:
        fh = fopen("d:\qq.txt", "r")
        for(i=1; i<=34; i++) {
        junk=fget(fh)
        }
        for(i=; i<=20; i++) {
        junk=fget(fh)
        junk
        }

end

but the result data in memory is only a messy.

Similar code has been used for other webpage, thanks to Prof. Kit
Baum, as can be seen following:

clear all
set obs 500
local stkcd="000002"
gen str20 date="2012.12.31"
copy "http://stockdata.stock.hexun.com/2008/lr.aspx?stockid=`stkcd'&accountdate=2012.12.31"
 d:\date.txt, replace
mata:
        fh = fopen("d:\date.txt", "r")
        for(i=1; i<=444; i++) {
        junk=fget(fh)
        }

Can someone familiar with Chinese encoding give me some hits?

Best

Chuntao
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index