Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Mata Data Structure or "variable" variable names for timeseries computations


From   Tirthankar Chakravarty <tirthankar.chakravarty@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Mata Data Structure or "variable" variable names for timeseries computations
Date   Wed, 1 Aug 2012 02:23:32 -0700

Matt,

You presumably have panel data located in multiple files. My solution
would be to append the datasets and use Mata's excellent understanding
of panel structure via the -panelsetup()- function. Instead of
submatrices based on panel identifiers, you can use submatrices based
on the time variable.

I have written you a pretty general ado file which with modifications
can be made to do virtually anything in the near neighbourhood of your
stated requirements. It asks you for the start and end years and which
variables from the read in files you would like to take differences
of, and then it saves out the matrices of differences to binary format
files which can be read in for later use.

1) First, the ado file

*------------------------------------------------- matdiff.ado
cap program drop matdiff
program matdiff, rclass
	version 12
	syntax varlist(min=1 numeric), ///
		[STartyr(integer 1995) ENdyr(integer 2010)] ///
		ID(varname) ///
		Time(varname)
	
	// put the variable names together
	local allvars `id' `time' `varlist'
	
	// read the files in and append
	use "file_`startyr'", clear
	forvalues year=`startyr'/`endyr' {
		if("file_`year'"~="file_`startyr'") {
			append using file_`year'
		}
	}

	// call the Mata function
	mata: a=fnCalcDiffMat(st_local("allvars"), ///
		strtoreal(st_local("startyr")), ///
		strtoreal(st_local("endyr")))
end

version 12
set matastrict off
mata

// define a class to store the returned matrices
struct stMattData {
	real matrix mDiff
	real scalar yearCurrent
	real scalar yearPrevious
}

// Mata function
struct stMattData colvector function ///
	fnCalcDiffMat(string scalar varlist, ///
	real scalar startyr, real scalar endyr) {

	// declare the vector of structures
	struct stMattData colvector stDiff
	
	real scalar fh  // the file handle
	string scalar sFileName  // filename string
	
	// create some views
	st_view(mV=., ., varlist, .)
	mV=sort(mV, (2, 1))  // sort by id within year

	// setup as panel
	st_subview(mV1=., mV, ., (3..length(tokens(varlist))))
			// pull variables into subview
	st_subview(mV2=., mV, ., 2)  // year variable is "panel"
	stInfo = panelsetup(mV2, 1)  // setup as panel

	// instantiate classes
	stDiff=  J(rows(stInfo)-1, 1, stMattData())
	
	// fill 'er up
	for(i=1; i<rows(stInfo); i++) {
		mCurrent = panelsubmatrix(mV1, i+1, stInfo)
		mPrevious = panelsubmatrix(mV1, i, stInfo)
		
		// do the computation
		stDiff[i].mDiff = mCurrent - mPrevious
		stDiff[i].yearCurrent = mV2[stInfo[i+1,1],1]
		stDiff[i].yearPrevious = mV2[stInfo[i,1],1]
		sFileName = invtokens(("file",
			strofreal(stDiff[i].yearCurrent) ,
			strofreal(stDiff[i].yearPrevious)),"_")
		// write the file to disk
		fh = fopen(sFileName, "rw")
		fputmatrix(fh, stDiff[i].mDiff)
		fclose(fh)

	}
	return(stDiff)  // no use currently, but handy
}
end
*------------------------------------------------- matdiff.ado


2) Next, a script that implements the ado file

*---------------------------------- matdiff_script.do
clear*
// start year and endyear
local startyr 1995
local endyr 2010

// generate dummy data
forvalues year=`startyr'/`endyr' {
	drop _all
	set obs 10
	drawnorm myvar1-myvar5
	g year = `year'
	g id=_n
	save file_`year', replace
}

run matdiff.ado
// call the program
matdiff myvar1-myvar5, startyr(1996) ///
	endyr(2005) id(id) time(year)
*---------------------------------- matdiff_script.do

Much simpler solutions are possible without invoking Mata, but this
program should be extensible.

T

On Tue, Jul 31, 2012 at 9:49 PM, Matthew McKay <matthew.mckay@anu.edu.au> wrote:
> Dear StataList,
>
> I am trying to compute differences in sets of matrices matrices using MATA
> for a time series 1995 to 2010.
> I am producing a mata function to compute the differences for each year
> after I load in ALL my yearly data matrices.
> My Issue is in Step#2 is that I want to use a similar methodology to local
> macro's in do files. (which can't be done)
>
> How can I pass part of a variable name (like Year) into the function and
> then perform a set of computations over the data that is loaded in MATA?
> In MATA how can you cycle through different variables (substituting in
> different year append) and compute a new set of matrices?
> Should I be constructing a Struct that contains a Data Matrix and Time
> Series Indicator?
>
> I understand the MATA function is compiled and therefore local macro
> substitution can't be done ... but was wondering if anyone else has an
> elegant solution to reference different variables in MATA by changing a
> component of the variablename (knowing the variable is defined in memory).
>
> Many Thanks,
> Matthew
>
> Step #1:
> **
> ** Import ALL MCP Matrix
> **
> local Years = "1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006
> 2007 2008 2009 2010"
> clear all
> local END = "end"
> foreach year of local Years {
>     use  [file_`year'.dta], clear              // Load MCP Matrices into
> Mata Memory //
>     drop ReporterISO3C
>     mata
>     Mcp_`year' = st_data(., .)
>     `END'
> }
>
> Step#2:
> ** MATA FUNCTION **
>
> clear all
> mata
> void calcdiffvectors(real scalar StartYear, real scalar EndYear) {
>   for(year = StartYear; year < EndYear; year++) {
>          !!!!!!!!! local NextYear = `year' + 1  !!!!!!!!!!!!!
>          !!!!!!!!! Mcp_`year'_`NextYear' = Mcp_`year' :- Mcp_`NextYear'
> !!!!!!!!!!
>   }
> }
> end
>
> Step #3:
> mata: calcdiffvectors(1995, 2010)
> // I can then retrieve the difference vectors using get mata etc. //
>
>



-- 
Tirthankar Chakravarty
tchakravarty@ucsd.edu
tirthankar.chakravarty@gmail.com
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index