Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Memory requirements for factor variables


From   Partha Deb <partha.deb@hunter.cuny.edu>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Memory requirements for factor variables
Date   Mon, 03 May 2010 09:21:13 -0400

The 10m setting was just to illustrate the issue. -regress- using -i.D- works with 10m but generating the indicators requires more than 10m. My raw dataset is about 6G!

Partha


Reifschneider Harry III wrote:
Judging by the code below, it appears that 10m is just not enough. If you have 8G available on your machine, just increase -set mem- to a higher level. I use 750m permanently and haven't yet ran into any shortages.

Cheers!

On May 2, 2010, at 3:29 PM, Partha Deb wrote:

Hi all,

I'm working with a large dataset and am running into the limits of RAM on my machine (8G). I run into this problem when I try to create about 500 indicator variables from a set of categorical variables. If I had only one categorical variable from which to create the indicators, I would do this directly in my -regress- command.

regress y x i.D

The example below shows that using -i.varname- is considerably more memory-efficient as compared to generating the indicators manually before -regress- , i.e. if one does,

forvalues i=1/100 {
  gen byte ID`i' = (D==`i')
}

If I had only one categorical variable to deal with, I would obviously use -i.varname- . But I need to do something like

forvalues i=1/100 {
  gen byte ID`i' = (D1==`i' | D2==`i' | D3==`i' | D4==`i')
}

How I can achieve this in a more memory efficient way? Thanks a lot. The example do and log are below.

Partha

******  do *******
clear all
set mem 10m
set more off

set seed 123456

set obs 100000

gen x = rnormal()
gen u = rnormal()
gen int d = int(_n/1000)

gen y = x + u

describe,s

qui regress y x i.d

sum d

forvalues i=1/`r(max)' {
  gen byte Id`i' = (d==`i')
}

describe,s

regress y x Id*

exit


******* log **********

. clear all

. set mem 10m

Current memory allocation

                  current                                 memory usage
  settable          value     description                 (1M = 1024k)
  --------------------------------------------------------------------
  set maxvar         5000     max. variables allowed           1.909M
  set memory           10M    max. data space                 10.000M
  set matsize         400     max. RHS vars in models          1.254M
                                                          -----------
                                                              13.163M

. set more off

.
. set seed 123456

.
. set obs 100000
obs was 0, now 100000

.
. gen x = rnormal()

. gen u = rnormal()

. gen int d = int(_n/1000)

.
. gen y = x + u

.
. describe,s

Contains data
obs: 100,000 vars: 4 size: 2,200,000 (82.8% of memory free)
Sorted by:     Note:  dataset has changed since last saved

.
. qui regress y x i.d

.
. sum d

  Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
         d |    100000      49.501    28.86623          0        100

.
. forvalues i=1/`r(max)' {
2.         gen byte Id`i' = (d==`i')
3. }
no room to add more variables because of width
An attempt was made to add a variable that would have increased the memory required to store an observation beyond what is currently possible. You have the following alternatives:

   1.  Store existing variables more efficiently; see help compress.

2. Drop some variables or observations; see help drop. (Think of Stata's data area as the
       area of a rectangle; Stata can trade off width and length.)

3. Increase the amount of memory allocated to the data area using the set memory command;
       see help memory.
r(902);


--
Partha Deb
Professor of Economics
Hunter College
ph:  (212) 772-5435
fax: (212) 772-5398
http://urban.hunter.cuny.edu/~deb/

Emancipate yourselves from mental slavery
None but ourselves can free our minds.
    - Bob Marley

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

--
Partha Deb
Professor of Economics
Hunter College
ph:  (212) 772-5435
fax: (212) 772-5398
http://urban.hunter.cuny.edu/~deb/

Emancipate yourselves from mental slavery
None but ourselves can free our minds.
	- Bob Marley

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index