Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: using data off the web


From   David Airey <david.airey@vanderbilt.edu>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: using data off the web
Date   Mon, 14 Apr 2008 15:41:04 -0500

Thank you for pointing this out. I did not try it because I did not find this documented. I may have missed it. But this is great and I will be careful about what is in the file.

Dave

Sent from my iPhone

On Apr 14, 2008, at 2:28 PM, Alan Riley <ariley@stata.com> wrote:


David Airey (david.airey@Vanderbilt.Edu) wished that -insheet- could
read from files over the web, just like -use-:
I know that "use" allows using Stata .dta files from the web, but if a
site has "insheet"-like data (and nothing else), it would be nice if I
could insheet from the http site, like you can "use" from an http site.
Some discussion ensued with various workarounds such as first -copy- ing
the file from the web to the local disk.

However, none of the workarounds are necessary. -insheet- already can
read files directly from the web:

. insheet using "http://www.genenetwork.org/cgi-bin/WebQTL.py?cmd=cor&probeset=rs13481111&db=BXDGeno&searchdb=bra12-03MAS5&return=500 "
(4 vars, 500 obs)

. describe

Contains data
obs: 500
vars: 4
size: 17,000 (99.8% of memory free)
----------------------------------------------------------------------
storage display value
variable name type format label variable label
----------------------------------------------------------------------
preprobesetid str11 %11s <pre>ProbesetID
correlation float %9.0g Correlation
strains byte %8.0g #Strains
pvalue str14 %14s p-value
----------------------------------------------------------------------
Sorted by:
Note: dataset has changed since last saved

I suspect that David tried a command similar to the one above without
quotes around the filename, which may have resulted an error message.
With a simple filename, quotes would not be required, but the filename
above is complicated with several characters in it (such as '=' which
could trip up Stata's parser).

By the way, the URL above does NOT return a plain text file. In the
output of -describe- above, you will notice the HTML tag "<pre>" in
the first variable label. And, if you -list- the data, you will see
the last value of the variable 'pvalue' contains a closing HTML tag
"</pre>" on it.

It is important when reading data directly from the web to remember
that Stata will see exactly what is sent to it by the remote
webserver. This may not be the same as what your eye sees in your
browser. It is a good idea to use a browser to do a "view source" on
the page of interest to make sure there are not extraneous HTML tags
in it that are probably not wanted. One possibility would be to
use Stata's -filefilter- command to strip out such tags.




--Alan Riley
(ariley@stata.com)
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index