Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Federico Belotti <f.belotti@econometrics.it> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Memory requirements for factor variables |

Date |
Mon, 3 May 2010 10:54:30 +0200 |

Partha, I think there is no way to do that in stata. An alternative could be mata. Clearly, you have to write down the ado for your econometric model. An example using OLS is below. HTH Federico ****** do ******* clear all set mem 10m set more off set seed 123456 set obs 100000 mata real matrix factor_reg(rows,cols,d1,d2,d3,d4,x,y) { D = J(rows,cols,0) for(i=1;i<=cols;i++) { for(j=1;j<=rows;j++) { if (d1[j]==i | d2[j]==i | d3[j]==i | d4[j]==i) D[j,i]=1 } } X = x,D,J(100000,1,1) Y = y beta = invsym(X'X)*(X'Y) beta } end gen x = rnormal() gen u = rnormal() gen int d = int(_n/1000) gen int d1 = int(_n/1100) gen int d2 = int(_n/1200) gen int d3 = int(_n/1300) gen int d4 = int(_n/1400) sum gen y = x + u describe,s regress y x i.d sum d tomata mata: factor_reg(100000,100,d1,d2,d3,d4,x,y) forvalues i=1/`r(max)' { gen byte Id`i' = (d1==`i' | d2==`i' | d3==`i' | d4==`i') } describe,s regress y x Id* exit -- Federico Belotti Faculty of Economics Department of Financial and Quantitative Economics University of Rome Tor Vergata tel: +39 06 7259 5624 e-mail: federico.belotti@uniroma2.it url: http://www.econometrics.it On 3 May 2010, at 00:29, Partha Deb wrote: > Hi all, > > I'm working with a large dataset and am running into the limits of RAM on my machine (8G). I run into this problem when I try to create about 500 indicator variables from a set of categorical variables. If I had only one categorical variable from which to create the indicators, I would do this directly in my -regress- command. > > regress y x i.D > > The example below shows that using -i.varname- is considerably more memory-efficient as compared to generating the indicators manually before -regress- , i.e. if one does, > > forvalues i=1/100 { > gen byte ID`i' = (D==`i') > } > > If I had only one categorical variable to deal with, I would obviously use -i.varname- . But I need to do something like > > forvalues i=1/100 { > gen byte ID`i' = (D1==`i' | D2==`i' | D3==`i' | D4==`i') > } > > How I can achieve this in a more memory efficient way? Thanks a lot. The example do and log are below. > > Partha > > ****** do ******* > clear all > set mem 10m > set more off > > set seed 123456 > > set obs 100000 > > gen x = rnormal() > gen u = rnormal() > gen int d = int(_n/1000) > > gen y = x + u > > describe,s > > qui regress y x i.d > > sum d > > forvalues i=1/`r(max)' { > gen byte Id`i' = (d==`i') > } > > describe,s > > regress y x Id* > > exit > > > ******* log ********** > > . clear all > > . set mem 10m > > Current memory allocation > > current memory usage > settable value description (1M = 1024k) > -------------------------------------------------------------------- > set maxvar 5000 max. variables allowed 1.909M > set memory 10M max. data space 10.000M > set matsize 400 max. RHS vars in models 1.254M > ----------- > 13.163M > > . set more off > > . > . set seed 123456 > > . > . set obs 100000 > obs was 0, now 100000 > > . > . gen x = rnormal() > > . gen u = rnormal() > > . gen int d = int(_n/1000) > > . > . gen y = x + u > > . > . describe,s > > Contains data > obs: 100,000 vars: 4 size: 2,200,000 (82.8% of memory free) > Sorted by: Note: dataset has changed since last saved > > . > . qui regress y x i.d > > . > . sum d > > Variable | Obs Mean Std. Dev. Min Max > -------------+-------------------------------------------------------- > d | 100000 49.501 28.86623 0 100 > > . > . forvalues i=1/`r(max)' { > 2. gen byte Id`i' = (d==`i') > 3. } > no room to add more variables because of width > An attempt was made to add a variable that would have increased the memory required to store > an observation beyond what is currently possible. You have the following alternatives: > > 1. Store existing variables more efficiently; see help compress. > > 2. Drop some variables or observations; see help drop. (Think of Stata's data area as the > area of a rectangle; Stata can trade off width and length.) > > 3. Increase the amount of memory allocated to the data area using the set memory command; > see help memory. > r(902); > > > -- > Partha Deb > Professor of Economics > Hunter College > ph: (212) 772-5435 > fax: (212) 772-5398 > http://urban.hunter.cuny.edu/~deb/ > > Emancipate yourselves from mental slavery > None but ourselves can free our minds. > - Bob Marley > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Memory requirements for factor variables***From:*Partha Deb <partha.deb@hunter.cuny.edu>

**References**:**st: Memory requirements for factor variables***From:*Partha Deb <partha.deb@hunter.cuny.edu>

- Prev by Date:
**Re: st: multiple imputation with multilevel models** - Next by Date:
**Re: st: Memory requirements for factor variables** - Previous by thread:
**Re: st: Memory requirements for factor variables** - Next by thread:
**Re: st: Memory requirements for factor variables** - Index(es):