Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Paul Higgins <pahiggins@LRCA.com> |
To | "'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu> |
Subject | RE: st: RE: regression r(103): too many variables |
Date | Wed, 24 Feb 2010 10:01:20 -0600 |
Hi all, Thanks for all of your suggestions: they were a big help. My code contained an error that is probably a classic newbie misstep: misusing hyphens when making lists of variables. The rhs of my regression contained thousands of interactions between sets of dummy variables (96 dummies representing quarter-hour time increments interacted with 22 date values of special import for the problem I was investigating, yielding a total of 2112 altogether just for that one pair of variables). To construct these, I used code of the following form: /*****************************/ /* generate separate dummies */ /* for each event date */ /*****************************/ #delimit ; local eventdates "mdy(1,13,2009) mdy(2,20,2009) mdy(3,27,2009) mdy(4,10,2009) mdy(4,17,2009) mdy(5,18,2009) mdy(5,23,2009) mdy(5,24,2009) mdy(6,30,2009) mdy(7,1,2009) mdy(7,9,2009) mdy(8,14,2009) mdy(8,15,2009) mdy(9,16,2009) mdy(9,18,2009) mdy(9,19,2009) mdy(10,3,2009) mdy(11,2,2009) mdy(11,3,2009) mdy(12,7,2009) mdy(12,8,2009) mdy(12,9,2009)"; #delimit cr local c = 1 foreach x of local eventdates { gen byte dum_`c' = (dt==`x') local c = `c' + 1 } /************************************/ /* interact each event date dummy w/*/ /* each quarter-hour interval dummy */ /************************************/ forvalues x = 1/96 { forvalues y = 1/22 { gen byte dum_`y'_int_`x' = dum_`y'*int_`x' } } Due to the order I used to nest the two loops, the variables weren't created in the same sequence as that assumed by my hyphenated lists in my regress statement. I am a recent arrival in Stata-world (having been born in SAS-land, and having emigrated here via several other intermediate stops along the way), and in most other stats programs I've worked with, a single hyphen in a list of this type (i.e., dum_1_int_1-dum_1_int_96) would be expanded out in logical sequential fashion (i.e., dum_1_int_1 dum_1_int_2 ...). But Stata expanded it out in the physical order in which the variables appeared in the data set (i.e., dum_1_int_1 dum_2_int_1 ...). Thus, my regressions contained far more than 2500 rhs variables -- mostly redundant ones! Once I replaced the hyphenated lists in the regress statement with wild-card versions (e.g., dum_1_int_*), all was well. Thanks again for your assitance. Paul H. -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Martin Weiss Sent: Wednesday, February 24, 2010 1:59 AM To: statalist@hsphsun2.harvard.edu Subject: AW: st: RE: regression r(103): too many variables <> Andi may want to use ************* des, short ************* to prevent clutter on his screen. HTH Martin -----Ursprüngliche Nachricht----- Von: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von sjsamuels@gmail.com Gesendet: Mittwoch, 24. Februar 2010 06:13 An: statalist@hsphsun2.harvard.edu Betreff: Re: st: RE: regression r(103): too many variables Verify that you actually have 2500 variables, possibly by running -des- on the variable list. Steve --- Paul Higgins > I am trying to use regress to run a linear regression. The > specification has a lot of rhs variables (around 2500), the > majority of which are binary (0/1) variables. <snip> I am > getting r(103), "Too many variables specified". On Tue, Feb 23, 2010 at 1:08 PM, Martin Weiss <martin.weiss1@gmx.de> wrote: > > <> > > > This runs w/o a hitch in Stata 10.1 MP. Takes something like 2 minutes: > > ******* > clear* > set mem 500m > set obs 13700 > > foreach var of newlist var1-var2500{ > gen byte `var'=runiform()<.3 > } > > gen y=rnormal() > reg y var1-var2500 > ******* > > > HTH > Martin > > > -----Original Message----- > From: owner-statalist@hsphsun2.harvard.edu > [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Paul Higgins > Sent: Dienstag, 23. Februar 2010 21:28 > To: 'statalist@hsphsun2.harvard.edu' > Subject: st: regression r(103): too many variables > > Hi all, > > I am trying to use regress to run a linear regression. The specification > has a lot of rhs variables (around 2500), the majority of which are binary > (0/1) variables. The data set contains about 13700 observations. At the > top of the .do file I set mem to 5 gigabytes, maxvar to 10000 and matsize to > 10000. I'm using Stata / SE 10.1 for Windows, under Windows XP Professional > x64 edition version 5.2, on a machine that has 8 gigabytes of physical > memory on-board. I am getting r(103), "Too many variables specified". I've > poked around the documentation, and I can see no mention of any internal > limits to the regress command regarding number of variables. Thus, I have > assumed that only the general limits for Stata SE apply: maximum of 32767 > variables, maximum matsize of 11000. But I appear to be wrong. > > Suggestions, please? > > PaulH > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > -- Steven Samuels sjsamuels@gmail.com 18 Cantine's Island Saugerties NY 12477 USA 845-246-0774 * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/