Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: speeding up program


From   Patricia Sourdin <patricia.sourdin@adelaide.edu.au>
To   "'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu>
Subject   st: speeding up program
Date   Wed, 27 Oct 2004 09:57:02 +0930

Hi statalist

a question on speeding up a program with a large dataset if someone can help.

I have household data that is set up as the following example:


HHID  MEMID  MONTH SALARY

481     1     1     2000
481     1     2     2500
481     1     3     2000
481     2     2     4000
482     1     2     7400
482     1     3     3600
482     2     1     5000
482     2     2     5500
482     2     3     2000
483     3     1     7000
483     3     2     7500
483     3     3     8000


In other words, I have monthly salary data on each individual (memid) in each 
household(hhid). My data set has about 60,000 observations.
I have written the following command to sum salary across month for each 
individual

by hhid memid,sort: egen quart_inc=sum(salary)

which works fine except then I have to get rid of all the duplicate totals 
created for each individual.  So then I wrote

program define dupinc, byable(recall)
syntax [varlist] [if] [in]
marksample touse
duplicates drop `varlist' if `touse',force
end

by hhid memid: dupinc quart_inc


which also works fine except it takes forever - it ran for several hours 
yesterday!   I'm running it on a Centrino, 1.7Ghz, 512mb, notebook.
Is there any way I can speed this up considerably?  I also forgot to put -qui-
in front so this might have helped.
Hope someone can help, please.
Patricia







*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index