Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: using data off the web


From   Alan Riley <ariley@stata.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: using data off the web
Date   Mon, 14 Apr 2008 14:28:25 -0500

David Airey (david.airey@Vanderbilt.Edu) wished that -insheet- could
read from files over the web, just like -use-:
> I know that "use" allows using Stata .dta files from the web, but if a  
> site has "insheet"-like data (and nothing else), it would be nice if I  
> could insheet from the http site, like you can "use" from an http site.

Some discussion ensued with various workarounds such as first -copy-ing
the file from the web to the local disk.

However, none of the workarounds are necessary.  -insheet- already can
read files directly from the web:

  . insheet using "http://www.genenetwork.org/cgi-bin/WebQTL.py?cmd=cor&probeset=rs13481111&db=BXDGeno&searchdb=bra12-03MAS5&return=500";
  (4 vars, 500 obs)

  . describe

  Contains data
    obs:           500
   vars:             4
   size:        17,000 (99.8% of memory free)
  ----------------------------------------------------------------------
                storage  display     value
  variable name   type   format      label      variable label
  ----------------------------------------------------------------------
  preprobesetid   str11  %11s                   <pre>ProbesetID
  correlation     float  %9.0g                  Correlation
  strains         byte   %8.0g                  #Strains
  pvalue          str14  %14s                   p-value
  ----------------------------------------------------------------------
  Sorted by:
       Note:  dataset has changed since last saved

I suspect that David tried a command similar to the one above without
quotes around the filename, which may have resulted an error message.
With a simple filename, quotes would not be required, but the filename
above is complicated with several characters in it (such as '=' which
could trip up Stata's parser).

By the way, the URL above does NOT return a plain text file.  In the
output of -describe- above, you will notice the HTML tag "<pre>" in
the first variable label.  And, if you -list- the data, you will see
the last value of the variable 'pvalue' contains a closing HTML tag
"</pre>" on it.

It is important when reading data directly from the web to remember
that Stata will see exactly what is sent to it by the remote
webserver.  This may not be the same as what your eye sees in your
browser.  It is a good idea to use a browser to do a "view source" on
the page of interest to make sure there are not extraneous HTML tags
in it that are probably not wanted.  One possibility would be to
use Stata's -filefilter- command to strip out such tags.


 

--Alan Riley
(ariley@stata.com)
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index