Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Splitting a dataset efficiently/run regression repeatedly in subsets


From   "Trelle Sven" <[email protected]>
To   <[email protected]>
Subject   RE: st: Splitting a dataset efficiently/run regression repeatedly in subsets
Date   Mon, 15 Nov 2010 17:28:27 +0100

> From: Sergiy Radyakin
> Sent: Monday, November 15, 2010 4:54 PM

> 50000 regressions on 8-observations dataset of two variables 
> should take about 30 seconds (see below).

See below

> So don't generate the large dataset, but rather run the 
> regressions right away when you generate your simulated data.
> You don't need to save the 50000x8 observations you 
> generated, as [presumably] you are also doing it with Stata, 
> so next time you simulate them with your do-file - they will 
> be the same (don't forget to set the rnd seed)

No, the simulations were not done in Stata

> On the other hand, since you need only one coefficient from 
> this trivial regression, you may ask yourself if the 
> -regress- artillery is really necessary here, or a trivial 
> formula, such as the one here:
> http://en.wikipedia.org/wiki/Regression_analysis
> would suffice (and be faster).

Thanks, I will give it a try although I am not sure whether the
regression is actually the problem (see response below)
 
> In any case, don't forget to specify -quietly-. I am almost 
> sure you don't have any intention to review the output of the 
> 50,000 regressions, and that speeds up the program a lot.

Yes, I do it quietly in my do-file but skipped it for the example codes.

> . do "R:\TEMP\STD04000000.tmp"
> . set rmsg on
> r; t=0.00 10:42:16
> . sysuse auto, clear
> (1978 Automobile Data)
> r; t=0.00 10:42:16
> . keep in 1/8
> (66 observations deleted)
> r; t=0.00 10:42:16
> .
> . forvalues i=1/50000 {
>   2.    qui regress price weight
>   3. }
> r; t=26.53 10:42:42
> .
> end of do-file
> r; t=26.53 10:42:42

I have a large dataset (400,000 obs and not 8) and need to analyse a
subset and that's probably the issue (not the regression itself or the
loop).

BW/Sven
 


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index