Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: dynamic line execution in mata


From   Nick Cox <[email protected]>
To   "[email protected]" <[email protected]>
Subject   Re: st: dynamic line execution in mata
Date   Mon, 10 Feb 2014 19:59:56 +0000

Not to deny you the chance of being the first user to do this, but a
simple extra thought is why hasn't this (apparently) been done before
by a user? (If that's untrue, this implodes.)

Saving the data means, in full generality, saving not only variables,
but also dataset, variable and value labels and characteristics. If
you want to say "I only care about variable values" that's a point of
view, but best to be explicit about what you want to do and what
you're ignoring.

Nick
[email protected]

On 10 February 2014 19:39, Nick Cox <[email protected]> wrote:
> Phil Schumm pointed to associative arrays in a later answer.
>
> I wrote -savesome- (SSC) to what you want to do. It's not fast. Its
> original version from 2001 long predates Mata. It's a (lousy)
> benchmark for you.
>
> I wouldn't call Mata line by line but that in itself is probably trivial.
>
> Nick
> [email protected]
>
>
> On 10 February 2014 19:22, Andrew Maurer <[email protected]> wrote:
>> Thanks for the response, Nick. I looked into pointers and have been able to make use of them. I'll give the background of the problem. I would be very interested to hear if anyone has thoughts on the efficiency of the code I have so far (see bottom of post).
>>
>> I am writing a Stata program, saveif, that will save a subset of observations of a dataset to a file. One method to accomplish this would be to do something like:
>> preserve
>> keep if...
>> save...
>> restore
>>
>> However, for large datasets (eg 20gb) and few observations to be saved (eg - a few mb of outliers), I expect that the preserve/restore method is grossly inefficient, since it involves writing the entire dataset from memory to hard-disk, then reading it back from hard-disk to memory.
>>
>> An alternative method to accomplish the task would be to somewhat manually "file write" the individual observations to a file, without having to clear and load back the dataset from memory. I have a nearly complete example here, where there is one part that has been hard-coded to the specific example of gnp96.dta. The code is still somewhat rough.
>>
>> Thank you,
>> Andrew Maurer
>>
>>
>>
>> * Want to write a program that will save a set of observations into a dataset
>> mata: mata clear
>> clear all
>>
>> cap program drop saveif
>> program define saveif
>>         syntax varlist [if] [in] using/, [replace]
>>         putmata `varlist' `if' `in', view
>>
>>         * put a row vector called varnames to mata
>>         forval i = 1/`: word count `varlist'' {
>>                 if `i' == 1 {
>>                         mata: varnames = "`: word `i' of `varlist''"
>>                         mata: vartypes = "`: type `: word `i' of `varlist'''"
>>                         mata: varpointers = &`: word `i' of `varlist'' // pointers
>>                 }
>>                 else {
>>                         mata: varnames = varnames,"`: word `i' of `varlist''"
>>                         mata: vartypes = vartypes,"`: type `: word `i' of `varlist'''"
>>                         mata: varpointers = varpointers,&`: word `i' of `varlist'' // pointers
>>                 }
>>         }
>>         * save vector of varnames to file
>>         cap confirm new file "`using'"
>>         if _rc != 0 {
>>                 di as error "`using' exists. replacing"
>>                 rm "`using'"
>>         }
>>         mata: fh = fopen("`using'", "w")
>>         mata: fputmatrix(fh, varnames)
>>         mata: fputmatrix(fh, vartypes)
>>         mata: fputmatrix(fh, varpointers)
>>
>>         * write observations of each variable to file
>>         forval i = 1/`: word count `varlist'' {
>>                 mata: fputmatrix(fh, `: word `i' of `varlist'')
>>         }
>>
>>         mata: fclose(fh)
>> end
>>
>>
>> capture mata mata drop recover_from_saveif()
>> mata:
>> void recover_from_saveif(string fileloc)
>> {
>>
>>         fh = fopen(fileloc, "r")
>>         varnames = fgetmatrix(fh)
>>         vartypes = fgetmatrix(fh)
>>         varpointers = fgetmatrix(fh)
>>         // ----- hard coded part!! try to get this into loop
>>         date = fgetmatrix(fh)
>>         gnp96 = fgetmatrix(fh)
>>         // -------------------------------------------------
>>         fclose(fh)
>>         varcount = cols(varnames)
>>
>>         // ------- this loop not working yet. need to figure out syntax
>>         // foreach var of varnames, read var from file to mata
>>         for (i=1; i<=varcount;i++) {
>>           // varnames[1,i] = fgetmatrix(fh)
>>         }
>>         // -------------------------------------------------
>>
>>         // foreach var of varnames, load var into stata with correct variable type
>>         for (i=1; i<=varcount;i++) {
>>                 thisvarname = varnames[1,i] // eg contains "date"
>>                 thisvartype = vartypes[1,i] // eg contains "int"
>>                 thisvar = varpointers[1,i] // eg pointer to date vector
>>                 if (i == 1) st_addobs(rows(*thisvar))
>>                 st_store(., st_addvar(thisvartype,thisvarname),*thisvar)
>>         }
>>
>> }
>> end
>>
>> cap program drop recover_from_saveif
>> program define recover_from_saveif
>>         syntax using/, [replace]
>>
>>         mata: recover_from_saveif("`using'")
>>
>> end
>>
>>
>> sysuse gnp96.dta, clear
>>
>> saveif * in 1/5 using test5.txt
>>
>> clear
>> recover_from_saveif using test5.txt
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: [email protected] [mailto:[email protected]] On Behalf Of Nick Cox
>> Sent: Monday, February 10, 2014 11:59 AM
>> To: [email protected]
>> Subject: Re: st: dynamic line execution in mata
>>
>> You are presuming that such a thing exists.
>>
>> In essence, Mata has no direct equivalent of macro substitution.
>>
>> Sometimes, the way to solve (similar) problems is by direct manipulation of strings. That is the theme of
>>
>> SJ-11-2 pr0052  . . . . Stata tip 100: Mata and the case of the missing macros
>>         . . . . . . . . . . . . . . . . . . . . . . . . W. Gould and N. J. Cox
>>         Q2/11   SJ 11(2):323--324                                (no commands)
>>         tip showing how to do the equivalent of Stata's macro
>>         substitution in Mata
>>
>> Sometimes, using pointers is the answer.
>>
>> In this case, I'd guess that you want the Mata equivalent of some Stata operation and that there's a Mata way of doing it, but I would rather hear whether that is so than try to guess what the underlying problem is.
>>
>> Nick
>> [email protected]
>>
>>
>> On 10 February 2014 17:38, Andrew Maurer <[email protected]> wrote:
>>> Hi Statalist,
>>>
>>> I am trying to find Mata's equivalent of Stata's macro expansion functionality. In the below example, I first define an object thisvar as the string "date" and I define the object date as the column vector 1 \ 2 \ 3 \ 4 \ 5. How can I return the contents of the "date" object by only referencing "thisvar"?
>>>
>>> In the line, rows( thisvar ), thisvar is simply the 1x1 matrix containing the string "date", so rows( thisvar ) returns: 1. What I am looking for is something like rows( `=thisvar' ), so as to return 5 rather than 1.
>>>
>>> ********* begin example *********
>>>
>>> mata
>>>
>>> i = 1
>>> date = 1 \ 2 \ 3 \ 4 \ 5
>>> varnames = "date", "price"
>>> thisvar = varnames[1,i]
>>> rows( thisvar ) // output: 1
>>> rows( date ) // output: 5
>>>
>>> end
>>>
>>> ********* end example ***********
>>>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index