Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Memory requirements for factor variables
From 
 
Austin Nichols <[email protected]> 
To 
 
[email protected] 
Subject 
 
Re: st: Memory requirements for factor variables 
Date 
 
Mon, 3 May 2010 09:32:06 -0400 
Partha--
I think you want to model your code on -fese- (ssc desc fese) or
-felsdvreg- or -felsdvregdm- (findit felsdvreg).  But can you give a
more germane example?  Do you really mean to create dummies based on
an OR condition over 4 categorical variables (testing whether any of
the four is a given level)?  Do you need estimates for your 500
dummies, or do you just want to partial them out of the regression?
The second is much easier than the first.
forvalues i=1/100 {
  gen byte ID`i' = (D1==`i' | D2==`i' | D3==`i' | D4==`i')
}
On Mon, May 3, 2010 at 9:23 AM, Partha Deb <[email protected]> wrote:
> Federico - that is definitely a solution I hadn't thought of.  But, I do
> worry that the "simple" formula for the OLS estimate may not be optimal
> given the size of the dataset and potential scaling issues.  I'm still
> holding out for a slick answer from the Stata gurus, but I might end up
> using yours.  Thanks.
>
> Partha
>
>
> Federico Belotti wrote:
>>
>> Partha,
>>
>> I think there is no way to do that in stata. An alternative could be mata.
>> Clearly, you have to write down the ado for your econometric model. An
>> example using OLS is below.
>>
>> HTH
>>
>> Federico
>>
>>
>> ******  do *******
>> clear all
>> set mem 10m
>> set more off
>>
>> set seed 123456
>>
>> set obs 100000
>>
>> mata
>> real matrix factor_reg(rows,cols,d1,d2,d3,d4,x,y) {
>>
>>        D = J(rows,cols,0)
>>        for(i=1;i<=cols;i++) {
>>                for(j=1;j<=rows;j++) {
>>                        if (d1[j]==i | d2[j]==i | d3[j]==i | d4[j]==i)
>> D[j,i]=1
>>                }
>>        }
>>        X = x,D,J(100000,1,1)
>>        Y = y
>>        beta = invsym(X'X)*(X'Y)
>>        beta
>> }
>> end
>>
>> gen x = rnormal()
>> gen u = rnormal()
>> gen int d = int(_n/1000)
>> gen int d1 = int(_n/1100)
>> gen int d2 = int(_n/1200)
>> gen int d3 = int(_n/1300)
>> gen int d4 = int(_n/1400)
>>
>> sum
>>
>> gen y = x + u
>>
>> describe,s
>>
>> regress y x i.d
>>
>> sum d
>>
>> tomata
>> mata: factor_reg(100000,100,d1,d2,d3,d4,x,y)
>>
>> forvalues i=1/`r(max)' {
>>
>> gen byte Id`i' = (d1==`i' | d2==`i' | d3==`i' | d4==`i')
>> }
>>
>> describe,s
>>
>> regress y x Id*
>>
>>
>> exit
>>
>>
>>
>>
>>
>
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/