Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Reifschneider Harry III <reifschneiderh3@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Memory requirements for factor variables |

Date |
Sun, 2 May 2010 21:59:51 -0700 |

Cheers! On May 2, 2010, at 3:29 PM, Partha Deb wrote:

Hi all,I'm working with a large dataset and am running into the limits ofRAM on my machine (8G). I run into this problem when I try tocreate about 500 indicator variables from a set of categoricalvariables. If I had only one categorical variable from which tocreate the indicators, I would do this directly in my -regress-command.regress y x i.DThe example below shows that using -i.varname- is considerably morememory-efficient as compared to generating the indicators manuallybefore -regress- , i.e. if one does,forvalues i=1/100 { gen byte ID`i' = (D==`i') }If I had only one categorical variable to deal with, I wouldobviously use -i.varname- . But I need to do something likeforvalues i=1/100 { gen byte ID`i' = (D1==`i' | D2==`i' | D3==`i' | D4==`i') }How I can achieve this in a more memory efficient way? Thanks alot. The example do and log are below.Partha ****** do ******* clear all set mem 10m set more off set seed 123456 set obs 100000 gen x = rnormal() gen u = rnormal() gen int d = int(_n/1000) gen y = x + u describe,s qui regress y x i.d sum d forvalues i=1/`r(max)' { gen byte Id`i' = (d==`i') } describe,s regress y x Id* exit ******* log ********** . clear all . set mem 10m Current memory allocation current memory usage settable value description (1M = 1024k) -------------------------------------------------------------------- set maxvar 5000 max. variables allowed 1.909M set memory 10M max. data space 10.000M set matsize 400 max. RHS vars in models 1.254M ----------- 13.163M . set more off . . set seed 123456 . . set obs 100000 obs was 0, now 100000 . . gen x = rnormal() . gen u = rnormal() . gen int d = int(_n/1000) . . gen y = x + u . . describe,s Contains dataobs: 100,000 vars:4 size: 2,200,000 (82.8% of memory free)Sorted by: Note: dataset has changed since last saved . . qui regress y x i.d . . sum d Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- d | 100000 49.501 28.86623 0 100 . . forvalues i=1/`r(max)' { 2. gen byte Id`i' = (d==`i') 3. } no room to add more variables because of widthAn attempt was made to add a variable that would have increasedthe memory required to storean observation beyond what is currently possible. You have thefollowing alternatives:1. Store existing variables more efficiently; see help compress.2. Drop some variables or observations; see help drop. (Thinkof Stata's data area as thearea of a rectangle; Stata can trade off width and length.)3. Increase the amount of memory allocated to the data areausing the set memory command;see help memory. r(902); -- Partha Deb Professor of Economics Hunter College ph: (212) 772-5435 fax: (212) 772-5398 http://urban.hunter.cuny.edu/~deb/ Emancipate yourselves from mental slavery None but ourselves can free our minds. - Bob Marley * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Memory requirements for factor variables***From:*Partha Deb <partha.deb@hunter.cuny.edu>

**References**:**st: Memory requirements for factor variables***From:*Partha Deb <partha.deb@hunter.cuny.edu>

- Prev by Date:
**Re: st: RE: comas in numbrers** - Next by Date:
**Re: st: multiple imputation with multilevel models** - Previous by thread:
**st: Memory requirements for factor variables** - Next by thread:
**Re: st: Memory requirements for factor variables** - Index(es):