Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: How to reference results from a big dataset within a program


From   Richard Williams <[email protected]>
To   [email protected], <[email protected]>
Subject   Re: st: How to reference results from a big dataset within a program
Date   Wed, 28 Aug 2013 08:26:16 -0500

At 06:06 AM 8/28/2013, Phil Schumm wrote:
On Aug 27, 2013, at 4:25 PM, "Chen,Minxing" <[email protected]> wrote:
> Basically, in the program I submitted, I had to reference results from a big pre-simulated dataset (four variables, but around 400,000 observations). In my previous submission, I simply submitted the pre-simulated dataset with my program, and within the program I called up that simulated dataset by using code such as " use c:\ado\personal\simudata". I was hoping when people download the program from SSC, the pre-simulated dataset will be also downloaded to the directory "c:\ado\personal\".
>
> Now my reviewer indicated that I can't expect users to do that, I can't even tell the user to place the file there because such a directory may not be creatable for the user (e.g. they might not have a C: drive). The reviewer suggested me to find some other way to get the information in my pre-simulated dataset, such as incorporating the data into the program.
>
> I tried to copy of the simulated data within my program by using syntax such as "input x y z k", however, since there are so many observations (a little more than 400,000), and there are system limit for the maximum lines of syntax within a program (around 3500), I was not able to do this way. The reviewer also mentioned that I may use "Mata library" function, but I am pretty new to Stata Mata. Is there anyone that may be able to help regarding this issue?


Basically you have two options. The first would be to deliver the dataset (i.e., .dta file) automatically along with the package. See -help usersite- or [R] net for the complete details, but essentially you'll want to use "F mydata.dta" rather than "f mydata.dta" to force the dataset to be installed in the system directories rather than the user's current working directory. You then call the dataset with

    sysuse mydata

This way, everything will "just work" regardless of the user's local setup, and users don't need to know (or worry) about where the file is located. This also makes it easy for you to update the file at a later date, if necessary.

The alternative would be to place the dataset on the web somewhere, and access it from within your code using the URL. The downside to this is that your command won't work unless the user has an internet connection, which would be annoying.

You learn something new every day. I would add that (a) give the data set a name that is somewhat esoteric and unlikely to be otherwise used, and (b) give it a name that will associate it with the program so that people don't wonder where it came from, e.g. myprog_data. Of course, I would make the same advice for all the files that will be installed.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index