Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Question about the threshold in subsample size


From   "Yigit Aydede" <[email protected]>
To   "statalist" <[email protected]>
Subject   st: Question about the threshold in subsample size
Date   Fri, 06 Sep 2013 19:48:08 +0000

Hello,

I apologize for asking seemingly a simple question, if anybody help me on this I greatly appreciate.

My dataset is too big to run clogit (fixed-effect) in Stata. I have more than 800K observations over 282 regions (clusters).   My dependent variable 1 for movers (4% of the total) across regions 0 for non-movers .

If I reduce (resample) the data size, I can run clogit on 282 regions.

Since the success rate is 4 percent, I would like to resample by 

sample 20 if moved==0, by(region), 
where moved  is the dependent variable.

Basically I only resample nonmovers and keep the movers the same.  My model is trying to find the determinants of moving decisions.  So I have a bunch of variables that control individual characteristics.  It seems to me that resampling only nonmovers reduces the power of nonmovers on estimations.  Am I right?

I would also do
sample 20, by(region)

I pick 20 here because only 20% gives me a right size that Strata can handle in clogit.

Is there any "right" way that I can find out a threshold size for the subsample, instead of using 20%.

I thank you for your time and help.  Any advice is much appreciated

Best,

Yigit Aydede
Saint Mary's University
Halifax, NS, B3H 3C3
Canada


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index