Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Using the -copy- command to download google ngram data


From   "Madsen,Paul" <[email protected]>
To   "[email protected]" <[email protected]>
Subject   RE: st: Using the -copy- command to download google ngram data
Date   Thu, 15 Dec 2011 15:33:48 +0000

It appears that this is a problem best solved using tools other than Stata. Thanks Nick Cox, Austin Nichols, and Mohammad Anees! You likely saved me many hours of frustrating and unproductive experimentation with this. 

Paul Madsen 


On 12/15/2011 Muhammad Anees wrote:

Although not simple or efficient in nature, I would suggest download the zip manually, extract the csv files, using -insheet- from stata, import the data files. It worked at least for me.

On Wed, Dec 14, 2011 at 8:53 PM, Madsen,Paul <[email protected]> wrote:
> Dear Statalist,
>
> I would like to download google's ngram data using stata's -copy- command. The data are located here: http://books.google.com/ngrams/datasets.
>
> I'm running Stata/SE 11.2 for windows 64 bit.
>
> Here's the relevant line of Stata code, which is intended to copy the zip file to a local directory and name it download.zip:
>
> copy 
> http://commondatastorage.googleapis.com/books/ngrams/books/googlebooks
> -eng-us-all-1gram-20090715-0.csv.zip download.zip
>
> The web address in the code was taken from the google ngram website (by right clicking the link to the file and pasting it in stata).
>
> When I run this code, I get the error:
>
> file 
> http://commondatastorage.googleapis.com/books/ngrams/books/googlebooks
> -eng-us-all-1gram-20090715-0.csv.zip not found server says file 
> temporarily redirected to 
> http://v5.lscache6.c.bigcache.googleapis.com/books/ngrams/books/google
> books-eng-us-all-1gram-20090715-0.csv.zip
>
> This looks like an issue on google's end. If I copy the new file location from the error text and run the stata code:
>
> copy 
> http://v5.lscache6.c.bigcache.googleapis.com/books/ngrams/books/google
> books-eng-us-all-1gram-20090715-0.csv.zip download.zip
>
> I get the error message "unexpected end of file." This problem is not isolated to the specific google ngram file in the example code. I've tried it on several of them with the same problem. I have also tested the code on a different zip file from a different website and the code works well when it is used on another dataset.
>
> It is hard for me to believe that google's files would have some fundamental flaw that makes download directly to Stata impossible. Can something be done in Stata to deal with such a problem (maybe using the shell command)?
>
> Thanks!
>
> Paul E. Madsen
> University of Florida
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/



-- 

Regards
---------------------------
Muhammad Anees
Assistant Professor
COMSATS Institute of Information Technology Attock 43600, Pakistan www.aneconomist.com

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index