Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: dynamic line execution in mata


From   Nick Cox <[email protected]>
To   "[email protected]" <[email protected]>
Subject   Re: st: dynamic line execution in mata
Date   Tue, 11 Feb 2014 18:18:38 +0000

This kind of thing is legal in Mata. For it to work in Stata,
-varname- and -lblname- need to be appropriate string scalars.

stata("label val " + varname + " " + lblname)

This is a central idea in the Gould and Cox Tip mentioned earlier in
the thread.

Nick
[email protected]


On 11 February 2014 18:07, Andrew Maurer <[email protected]> wrote:
> That would be great if someone has done this before, but I haven't found any user-written programs that do this. I have at least the barebones working using pointers (see updated code below with example execution using auto.dta). However, does anyone have advice on a few additional issues I'm having with mata:
>
> 1) How can I label a Stata variable using mata objects for the variable name and label? Eg) In recover_from_saveif() I have a string variable name stored in thisvarname and the label string stored in thisvarlabel.  Does mata have syntax available such as the following in order to build up the line piece by piece? (I'm not sure how to deal with the unmatched quotation marks to be sent to Stata.)
>
> execute( `"stata(`"label var " "' + thisvarname + `"""' + thisvarlabel `"")"' )
>
> 2) Are there issues with using st_store() for pointers to string varaibles? The recover_from_saveif() program works for numeric variables, but not string variables. The issue is in the line st_store(., st_addvar(thisvartype,thisvarname),*thisvar), which returns "nonreal found where real required" only for *thisvar which points to string data and not numeric data.
>
> 3) Is there a way to view the source code for commands like "save" that do not have a corresponding save.ado or save.mata file in the ado/base directory? Responding to the issue you raised, Nick, of not having value labels and dataset characteristics, is there a way to list and loop through them in stata/mata? Eg, "char dir" lists all characteristics associated with the dataset, but doesn't post to rclass results. How can I access them mid-program?
>
> Ps - just doing a rough test on some sample data to get a benchmark, savesome took me 7.90s on a 1gb dataset, while saveif took 0.44s.
>
> Thank you,
> Andrew Maurer
>
>
> * Want to write a program that will save a set of observations into a dataset
> mata: mata clear
> clear all
>
> cap program drop saveif
> program define saveif
>         syntax varlist [if] [in] using/, [replace]
>
>         * send varlist to mata
>         putmata `varlist' `if' `in', view
>
>         * create mata objects for 1) variable names 2) storeage types 3) labels 4) the data itself
>         forval i = 1/`: word count `varlist'' {
>                 if `i' == 1 {
>                         mata: varnames = "`: word `i' of `varlist''"
>                         mata: vartypes = "`: type `: word `i' of `varlist'''"
>                         mata: varlabels = "`: type `: word `i' of `varlist'''"
>                         mata: varpointers = &`: word `i' of `varlist'' // pointers
>                 }
>                 else {
>                         mata: varnames = varnames,"`: word `i' of `varlist''"
>                         mata: vartypes = vartypes,"`: type `: word `i' of `varlist'''"
>                         mata: varlabels = varlabels,"`: type `: word `i' of `varlist'''"
>                         mata: varpointers = varpointers,&`: word `i' of `varlist'' // pointers
>                 }
>         }
>
>         * save the created objects to a file
>         cap confirm new file "`using'"
>         if _rc != 0 {
>                 if "`replace'" == "replace" rm "`using'"
>                 else {
>                         di as error "file `using' already exists"
>                         error 1
>                 }
>         }
>         mata: fh = fopen("`using'", "w")
>         mata: fputmatrix(fh, varnames)
>         mata: fputmatrix(fh, vartypes)
>         mata: fputmatrix(fh, varlabels)
>         mata: fputmatrix(fh, varpointers)
>
>         mata: fclose(fh)
> end
>
>
> capture mata mata drop recover_from_saveif()
> mata:
> void recover_from_saveif(string fileloc)
> {
>
>         fh = fopen(fileloc, "r")
>
>         varnames = fgetmatrix(fh)
>         vartypes = fgetmatrix(fh)
>         varlabels = fgetmatrix(fh)
>         varpointers = fgetmatrix(fh)
>         varcount = cols(varnames)
>
>         fclose(fh)
>
>         // foreach var of varnames, load var into stata with correct variable type
>
>         for (i=1; i<=varcount;i++) {
>                 thisvarname = varnames[1,i] // eg contains "date"
>                 thisvartype = vartypes[1,i] // eg contains "int"
>                 thisvarlabel = varlabels[1,i] // eg pointer to date vector
>                 thisvar = varpointers[1,i] // eg pointer to date vector
>                 if (i == 1) st_addobs(rows(*thisvar))
>                 st_store(., st_addvar(thisvartype,thisvarname),*thisvar)
>         }
>
> }
> end
>
> cap program drop recover_from_saveif
> program define recover_from_saveif
>         syntax using/, [replace]
>
>         mata: recover_from_saveif("`using'")
>
> end
>
> * Sample execution using built-in dataset
> sysuse auto.dta, clear
> saveif price mpg rep78 headroom trunk weight length turn displacement gear_ratio foreign in 1/5 using test6.txt, replace
>
> clear
> recover_from_saveif using test6.txt
> exit
>
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of Nick Cox
> Sent: Monday, February 10, 2014 2:00 PM
> To: [email protected]
> Subject: Re: st: dynamic line execution in mata
>
> Not to deny you the chance of being the first user to do this, but a simple extra thought is why hasn't this (apparently) been done before by a user? (If that's untrue, this implodes.)
>
> Saving the data means, in full generality, saving not only variables, but also dataset, variable and value labels and characteristics. If you want to say "I only care about variable values" that's a point of view, but best to be explicit about what you want to do and what you're ignoring.
>
> Nick
> [email protected]
>
> On 10 February 2014 19:39, Nick Cox <[email protected]> wrote:
>> Phil Schumm pointed to associative arrays in a later answer.
>>
>> I wrote -savesome- (SSC) to what you want to do. It's not fast. Its
>> original version from 2001 long predates Mata. It's a (lousy)
>> benchmark for you.
>>
>> I wouldn't call Mata line by line but that in itself is probably trivial.
>>
>> Nick
>> [email protected]
>>
>>
>> On 10 February 2014 19:22, Andrew Maurer <[email protected]> wrote:
>>> Thanks for the response, Nick. I looked into pointers and have been able to make use of them. I'll give the background of the problem. I would be very interested to hear if anyone has thoughts on the efficiency of the code I have so far (see bottom of post).
>>>
>>> I am writing a Stata program, saveif, that will save a subset of observations of a dataset to a file. One method to accomplish this would be to do something like:
>>> preserve
>>> keep if...
>>> save...
>>> restore
>>>
>>> However, for large datasets (eg 20gb) and few observations to be saved (eg - a few mb of outliers), I expect that the preserve/restore method is grossly inefficient, since it involves writing the entire dataset from memory to hard-disk, then reading it back from hard-disk to memory.
>>>
>>> An alternative method to accomplish the task would be to somewhat manually "file write" the individual observations to a file, without having to clear and load back the dataset from memory. I have a nearly complete example here, where there is one part that has been hard-coded to the specific example of gnp96.dta. The code is still somewhat rough.
>>>
>>> Thank you,
>>> Andrew Maurer
>>>
>>>
>>>
>>> * Want to write a program that will save a set of observations into a
>>> dataset
>>> mata: mata clear
>>> clear all
>>>
>>> cap program drop saveif
>>> program define saveif
>>>         syntax varlist [if] [in] using/, [replace]
>>>         putmata `varlist' `if' `in', view
>>>
>>>         * put a row vector called varnames to mata
>>>         forval i = 1/`: word count `varlist'' {
>>>                 if `i' == 1 {
>>>                         mata: varnames = "`: word `i' of `varlist''"
>>>                         mata: vartypes = "`: type `: word `i' of `varlist'''"
>>>                         mata: varpointers = &`: word `i' of `varlist'' // pointers
>>>                 }
>>>                 else {
>>>                         mata: varnames = varnames,"`: word `i' of `varlist''"
>>>                         mata: vartypes = vartypes,"`: type `: word `i' of `varlist'''"
>>>                         mata: varpointers = varpointers,&`: word `i' of `varlist'' // pointers
>>>                 }
>>>         }
>>>         * save vector of varnames to file
>>>         cap confirm new file "`using'"
>>>         if _rc != 0 {
>>>                 di as error "`using' exists. replacing"
>>>                 rm "`using'"
>>>         }
>>>         mata: fh = fopen("`using'", "w")
>>>         mata: fputmatrix(fh, varnames)
>>>         mata: fputmatrix(fh, vartypes)
>>>         mata: fputmatrix(fh, varpointers)
>>>
>>>         * write observations of each variable to file
>>>         forval i = 1/`: word count `varlist'' {
>>>                 mata: fputmatrix(fh, `: word `i' of `varlist'')
>>>         }
>>>
>>>         mata: fclose(fh)
>>> end
>>>
>>>
>>> capture mata mata drop recover_from_saveif()
>>> mata:
>>> void recover_from_saveif(string fileloc) {
>>>
>>>         fh = fopen(fileloc, "r")
>>>         varnames = fgetmatrix(fh)
>>>         vartypes = fgetmatrix(fh)
>>>         varpointers = fgetmatrix(fh)
>>>         // ----- hard coded part!! try to get this into loop
>>>         date = fgetmatrix(fh)
>>>         gnp96 = fgetmatrix(fh)
>>>         // -------------------------------------------------
>>>         fclose(fh)
>>>         varcount = cols(varnames)
>>>
>>>         // ------- this loop not working yet. need to figure out syntax
>>>         // foreach var of varnames, read var from file to mata
>>>         for (i=1; i<=varcount;i++) {
>>>           // varnames[1,i] = fgetmatrix(fh)
>>>         }
>>>         // -------------------------------------------------
>>>
>>>         // foreach var of varnames, load var into stata with correct variable type
>>>         for (i=1; i<=varcount;i++) {
>>>                 thisvarname = varnames[1,i] // eg contains "date"
>>>                 thisvartype = vartypes[1,i] // eg contains "int"
>>>                 thisvar = varpointers[1,i] // eg pointer to date vector
>>>                 if (i == 1) st_addobs(rows(*thisvar))
>>>                 st_store(., st_addvar(thisvartype,thisvarname),*thisvar)
>>>         }
>>>
>>> }
>>> end
>>>
>>> cap program drop recover_from_saveif
>>> program define recover_from_saveif
>>>         syntax using/, [replace]
>>>
>>>         mata: recover_from_saveif("`using'")
>>>
>>> end
>>>
>>>
>>> sysuse gnp96.dta, clear
>>>
>>> saveif * in 1/5 using test5.txt
>>>
>>> clear
>>> recover_from_saveif using test5.txt
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> -----Original Message-----
>>> From: [email protected]
>>> [mailto:[email protected]] On Behalf Of Nick Cox
>>> Sent: Monday, February 10, 2014 11:59 AM
>>> To: [email protected]
>>> Subject: Re: st: dynamic line execution in mata
>>>
>>> You are presuming that such a thing exists.
>>>
>>> In essence, Mata has no direct equivalent of macro substitution.
>>>
>>> Sometimes, the way to solve (similar) problems is by direct
>>> manipulation of strings. That is the theme of
>>>
>>> SJ-11-2 pr0052  . . . . Stata tip 100: Mata and the case of the missing macros
>>>         . . . . . . . . . . . . . . . . . . . . . . . . W. Gould and N. J. Cox
>>>         Q2/11   SJ 11(2):323--324                                (no commands)
>>>         tip showing how to do the equivalent of Stata's macro
>>>         substitution in Mata
>>>
>>> Sometimes, using pointers is the answer.
>>>
>>> In this case, I'd guess that you want the Mata equivalent of some Stata operation and that there's a Mata way of doing it, but I would rather hear whether that is so than try to guess what the underlying problem is.
>>>
>>> Nick
>>> [email protected]
>>>
>>>
>>> On 10 February 2014 17:38, Andrew Maurer <[email protected]> wrote:
>>>> Hi Statalist,
>>>>
>>>> I am trying to find Mata's equivalent of Stata's macro expansion functionality. In the below example, I first define an object thisvar as the string "date" and I define the object date as the column vector 1 \ 2 \ 3 \ 4 \ 5. How can I return the contents of the "date" object by only referencing "thisvar"?
>>>>
>>>> In the line, rows( thisvar ), thisvar is simply the 1x1 matrix containing the string "date", so rows( thisvar ) returns: 1. What I am looking for is something like rows( `=thisvar' ), so as to return 5 rather than 1.
>>>>
>>>> ********* begin example *********
>>>>
>>>> mata
>>>>
>>>> i = 1
>>>> date = 1 \ 2 \ 3 \ 4 \ 5
>>>> varnames = "date", "price"
>>>> thisvar = varnames[1,i]
>>>> rows( thisvar ) // output: 1
>>>> rows( date ) // output: 5
>>>>
>>>> end
>>>>
>>>> ********* end example ***********
>>>>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
>
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index