Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
sjsamuels@gmail.com |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: RE: regression r(103): too many variables |

Date |
Wed, 24 Feb 2010 13:43:10 -0800 |

Now that you've figured out what caused the error message, perhaps you should reconsider your proposed analysis. You have too few observations to fit 2500 predictors.The rule of thumb, I believe, is that the ratio of observations to coefficients should be greater than 10:1. Steve On Wed, Feb 24, 2010 at 8:01 AM, Paul Higgins <pahiggins@lrca.com> wrote: > Hi all, > > Thanks for all of your suggestions: they were a big help. My code contained an error that is probably a classic newbie misstep: misusing hyphens when making lists of variables. The rhs of my regression contained thousands of interactions between sets of dummy variables (96 dummies representing quarter-hour time increments interacted with 22 date values of special import for the problem I was investigating, yielding a total of 2112 altogether just for that one pair of variables). To construct these, I used code of the following form: > > /*****************************/ > /* generate separate dummies */ > /* for each event date */ > /*****************************/ > > #delimit ; > local eventdates "mdy(1,13,2009) mdy(2,20,2009) mdy(3,27,2009) > mdy(4,10,2009) mdy(4,17,2009) mdy(5,18,2009) > mdy(5,23,2009) mdy(5,24,2009) mdy(6,30,2009) > mdy(7,1,2009) mdy(7,9,2009) mdy(8,14,2009) > mdy(8,15,2009) mdy(9,16,2009) mdy(9,18,2009) > mdy(9,19,2009) mdy(10,3,2009) mdy(11,2,2009) > mdy(11,3,2009) mdy(12,7,2009) mdy(12,8,2009) > mdy(12,9,2009)"; > #delimit cr > local c = 1 > foreach x of local eventdates { > gen byte dum_`c' = (dt==`x') > local c = `c' + 1 > } > > /************************************/ > /* interact each event date dummy w/*/ > /* each quarter-hour interval dummy */ > /************************************/ > > forvalues x = 1/96 { > forvalues y = 1/22 { > gen byte dum_`y'_int_`x' = dum_`y'*int_`x' > } > } > > Due to the order I used to nest the two loops, the variables weren't created in the same sequence as that assumed by my hyphenated lists in my regress statement. I am a recent arrival in Stata-world (having been born in SAS-land, and having emigrated here via several other intermediate stops along the way), and in most other stats programs I've worked with, a single hyphen in a list of this type (i.e., dum_1_int_1-dum_1_int_96) would be expanded out in logical sequential fashion (i.e., dum_1_int_1 dum_1_int_2 ...). But Stata expanded it out in the physical order in which the variables appeared in the data set (i.e., dum_1_int_1 dum_2_int_1 ...). Thus, my regressions contained far more than 2500 rhs variables -- mostly redundant ones! Once I replaced the hyphenated lists in the regress statement with wild-card versions (e.g., dum_1_int_*), all was well. > > Thanks again for your assitance. > > Paul H. > > -----Original Message----- > From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Martin Weiss > Sent: Wednesday, February 24, 2010 1:59 AM > To: statalist@hsphsun2.harvard.edu > Subject: AW: st: RE: regression r(103): too many variables > > > <> > > Andi may want to use > > > ************* > des, short > ************* > > to prevent clutter on his screen. > > > HTH > Martin > > -----Ursprüngliche Nachricht----- > Von: owner-statalist@hsphsun2.harvard.edu > [mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von > sjsamuels@gmail.com > Gesendet: Mittwoch, 24. Februar 2010 06:13 > An: statalist@hsphsun2.harvard.edu > Betreff: Re: st: RE: regression r(103): too many variables > > Verify that you actually have 2500 variables, possibly by running > -des- on the variable list. > > Steve > --- Paul Higgins >> I am trying to use regress to run a linear regression. The >> specification has a lot of rhs variables (around 2500), the >> majority of which are binary (0/1) variables. <snip> I am >> getting r(103), "Too many variables specified". > > > On Tue, Feb 23, 2010 at 1:08 PM, Martin Weiss <martin.weiss1@gmx.de> wrote: >> >> <> >> >> >> This runs w/o a hitch in Stata 10.1 MP. Takes something like 2 minutes: >> >> ******* >> clear* >> set mem 500m >> set obs 13700 >> >> foreach var of newlist var1-var2500{ >> gen byte `var'=runiform()<.3 >> } >> >> gen y=rnormal() >> reg y var1-var2500 >> ******* >> >> >> HTH >> Martin >> >> >> -----Original Message----- >> From: owner-statalist@hsphsun2.harvard.edu >> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Paul Higgins >> Sent: Dienstag, 23. Februar 2010 21:28 >> To: 'statalist@hsphsun2.harvard.edu' >> Subject: st: regression r(103): too many variables >> >> Hi all, >> >> I am trying to use regress to run a linear regression. The specification >> has a lot of rhs variables (around 2500), the majority of which are binary >> (0/1) variables. The data set contains about 13700 observations. At the >> top of the .do file I set mem to 5 gigabytes, maxvar to 10000 and matsize > to >> 10000. I'm using Stata / SE 10.1 for Windows, under Windows XP > Professional >> x64 edition version 5.2, on a machine that has 8 gigabytes of physical >> memory on-board. I am getting r(103), "Too many variables specified". > I've >> poked around the documentation, and I can see no mention of any internal >> limits to the regress command regarding number of variables. Thus, I have >> assumed that only the general limits for Stata SE apply: maximum of 32767 >> variables, maximum matsize of 11000. But I appear to be wrong. >> >> Suggestions, please? >> >> PaulH >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ >> > > > > -- > Steven Samuels > sjsamuels@gmail.com > 18 Cantine's Island > Saugerties NY 12477 > USA > 845-246-0774 > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > -- Steven Samuels sjsamuels@gmail.com 18 Cantine's Island Saugerties NY 12477 USA 845-246-0774 * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**RE: Re: st: RE: regression r(103): too many variables***From:*Paul Higgins <pahiggins@LRCA.com>

**References**:**st: regression r(103): too many variables***From:*Paul Higgins <pahiggins@LRCA.com>

**Re: st: RE: regression r(103): too many variables***From:*sjsamuels@gmail.com

**RE: st: RE: regression r(103): too many variables***From:*Paul Higgins <pahiggins@LRCA.com>

- Prev by Date:
**Re: st: GAM in survey commands** - Next by Date:
**st: test for clustering in instrumental variables settings** - Previous by thread:
**RE: st: RE: regression r(103): too many variables** - Next by thread:
**RE: Re: st: RE: regression r(103): too many variables** - Index(es):