Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: Using the -copy- command to download google ngram data


From   Nick Cox <[email protected]>
To   "'[email protected]'" <[email protected]>
Subject   st: RE: Using the -copy- command to download google ngram data
Date   Wed, 14 Dec 2011 17:49:55 +0000

I don't have a solution, and I too would be interested to hear one, but unfortunately I don't find this hard to believe. 

That Google would have a complicated structure that makes it hard to access their files this way doesn't seem surprising for all sorts of reasons, not least security and their wishing to keep track in their own way of who accesses what. 

That said, information to solve this seems more likely to come from Google people than from Stata experts. I don't think using the -shell- command will help at all. 

Nick 
[email protected] 

Madsen,Paul

I would like to download google's ngram data using stata's -copy- command. The data are located here: http://books.google.com/ngrams/datasets.

I'm running Stata/SE 11.2 for windows 64 bit. 

Here's the relevant line of Stata code, which is intended to copy the zip file to a local directory and name it download.zip: 

copy http://commondatastorage.googleapis.com/books/ngrams/books/googlebooks-eng-us-all-1gram-20090715-0.csv.zip download.zip

The web address in the code was taken from the google ngram website (by right clicking the link to the file and pasting it in stata). 

When I run this code, I get the error: 

file http://commondatastorage.googleapis.com/books/ngrams/books/googlebooks-eng-us-all-1gram-20090715-0.csv.zip not found
server says file temporarily redirected to http://v5.lscache6.c.bigcache.googleapis.com/books/ngrams/books/googlebooks-eng-us-all-1gram-20090715-0.csv.zip

This looks like an issue on google's end. If I copy the new file location from the error text and run the stata code: 

copy http://v5.lscache6.c.bigcache.googleapis.com/books/ngrams/books/googlebooks-eng-us-all-1gram-20090715-0.csv.zip download.zip

I get the error message "unexpected end of file." This problem is not isolated to the specific google ngram file in the example code. I've tried it on several of them with the same problem. I have also tested the code on a different zip file from a different website and the code works well when it is used on another dataset. 

It is hard for me to believe that google's files would have some fundamental flaw that makes download directly to Stata impossible. Can something be done in Stata to deal with such a problem (maybe using the shell command)?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index