Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: avoid collapse and yet get matrix of unit specific means in panel data

From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   st: RE: avoid collapse and yet get matrix of unit specific means in panel data
Date   Thu, 14 Sep 2006 13:01:05 +0100

Thanks for the recommendation. The reference mentioned 
was a few issues back in the Stata Journal: 

SJ-5-4  pr0018  . . . . . . . . . . . . Suggestions on Stata programming style
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
        Q4/05   SJ 5(4):560--566                                 (no commands)
        suggestions for good Stata programming style

You put your finger on a good point. Truth is, many Stata programmers use 
-preserve- all the time when they know it is the best solution. With modern
machines and small to moderate datasets, the cost of -preserve- and -restore-
can be trivial. But when is it the best solution?  The advice is in that list 
because I have witnessed rather a lot of beginners' programs in which -preserve- 
was unnecessary and indeed a nuisance. In your case, as you are evidently using -collapse- 
several times and want to keep relating results to data, it probably is unnecessary. 

So, are there other solutions? I am going to guess that the word "matrix" 
is a red herring here. You do want something that can be displayed as 
a matrix, but it doesn't sound as if you want to do anything with it 
qua matrix, as in invert it or get the eigenvalues. If that's wrong, do 
explain why. 

The most efficient way to get means is by calculating them directly:

<your data are sorted appropriately, by virtue of -tsset-> 

foreach v of local varlist { 
	tempvar mean 
	clonevar `mean' = `v' 
	local means "`means' `mean'" 
	by `pvar' : replace `mean' = sum(`v') / sum(`v' < .) 
	by `pvar' : replace `mean' = `mean'[_N] 

tabdisp `by', c(`means')

will then work nicely for a few variables. More generally, 

by `pvar' : gen byte `tag' = _n == 1 
list `means' if `tag' 

will give a basic tabulation that can be beautified. 

Many Stata programmers would use -egen- for convenience, despite
its inefficiency. Note in addition to -egen, mean()- the -egen, 
tag()- which can be used to display just one observation from 
each group. You can also learn a lot by looking _inside_ the 
-egen- functions as they exemplify various basic devices. Know
that the code for -egen, foo()- is in _gfoo.ado and can be 
viewed using 

. viewsource _gfoo.ado 

A much longer email could be written comparing these and other
solutions to your problem, but I am throwing the baton in 
the air for others to catch. 

[email protected] 

Tom Boonen
> I am rather new to programming in stata and just read Nick Cox's
> "Suggestions on Stata programming style"  (which I can really
> recommand for newbies) in the last issue of the Stata Journal. He
> urges programmers to avoid "presere" if possible. This suggestions
> makes sense to me, but I am struggeling on how to avoid it.
> My program makes frequent use of the "collapse" command, which of
> course changes the user's data, so it has to be restored each time. I
> wonder wheter there is an elegant way that obviates using collapse()
> (or similarly statsby() which uses collapse()).
> Here is an example of my problem:
> use, clear
> tsset company time
> local pvar "`r(panelvar)'"
> Assume my "varlist" contains variables like invest, market and stock.
> The task is to create a matrix that contains for each panel unit (i.e.
> company) the means of the varibales in `varlist' over the time
> periods. What my program does:
> qui collapse (mean) `varlist', by("`pvar'") fast
> qui mkmat `varlist', matrix(`X')
> That works well but I have to use "restore" now to go on. The natural
> thing to me was to look for a command like:
> bys "`pvar'": sum `varlist' meansonly
> but the return list does not allow me to grab the means by unit. So I
> thought how about:
> tabstat  `varlist',  by("`pvar'") statis(mean) save
> this gets me what i want on the screen, but the return list r()
> results only refer to each row of the summary table, not the table as
> a whole. I could loop through these individual rows and collect the
> matrix of course, but that takes long and seems suboptimal (exp. when
> I have a lot of units).
> Rather what I am looking for is something like the table , matcell()
> option but for tabstat, i.e. a command that grabs the table displayed
> on the screen and puts it in a matrix.
> Any suggestions? One complexcification may be that apart from getting
> the means over the time periods in other parts of my program i need to
> apply several aggregation functions the sd, quantiles, etc. But I
> would be happy if I could for now just find a elegant solution to get
> the means.

*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index