Steve Samuels <sjsamuels@gmail.com>

statalist@hsphsun2.harvard.edu

Re: AW: st: U.S. Census Data

Fri, 7 May 2010 09:13:56 -0400

Faced with a similar situation some years ago, I took a sample with strata formed from combinations of key variables and over-sampled some smaller groups of interest. I specified only probability weights for the analyses. As there were millions of observations in the sample, precision was not badly affected. If Nate were to take two mutually exclusive samples, he could formulate his models in the first and validate them in the second. Steve On Fri, May 7, 2010 at 8:44 AM, Martin Weiss <martin.weiss1@gmx.de> wrote: > > <> > > " OS (which is not 64-bit)" > > > Only Nate can answer this, but, just to be sure, Stas, how do you know > whether his XP is or is not 64-bit? Does this conclusion follow from his > specifications? > > > > HTH > Martin > > -----Ursprüngliche Nachricht----- > Von: owner-statalist@hsphsun2.harvard.edu > [mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von Stas Kolenikov > Gesendet: Freitag, 7. Mai 2010 14:29 > An: statalist@hsphsun2.harvard.edu > Betreff: Re: AW: st: U.S. Census Data > > Your limitation is the combination of the OS (which is not 64-bit) and > the hardware (which may or may not be 64-bit). Of course Stata 10 is > not the newest version, but if it works for your analyses, you don't > need to upgrade that. > > On Fri, May 7, 2010 at 5:27 AM, Nate Breznau > <nbreznau@bigsss.uni-bremen.de> wrote: >> Thank you for your responses. I am running the following specs: >> >> Stata 10.1 >> MS Win XP, SP3 >> On a 1.1 GHz, 1.93 GB RAM Processor >> >> >> I think my limitations may be the cpu... The most memory it will grant is >> 1g, and its not enough. >> >> >> Martin Weiss wrote: >>> >>> <> >>> >>> But careful with such examples: They do not say much about Nate`s problem >>> as >>> you are creating the default data type after -gen- which is "float". It >>> occupies 4 bytes, as in >>> http://www.stata.com/support/faqs/data/howbig.html. >>> Strings in particular could change the picture. >>> >>> >>> >>> HTH >>> Martin >>> >>> >>> -----Ursprüngliche Nachricht----- >>> Von: owner-statalist@hsphsun2.harvard.edu >>> [mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von Abdel Rahmen >>> El >>> Lahga >>> Gesendet: Donnerstag, 6. Mai 2010 16:49 >>> An: statalist@hsphsun2.harvard.edu >>> Betreff: Re: st: U.S. Census Data >>> >>> This basicaly a memory problem. Stata can handle bigger data set. YOu >>> say nothing about your OS nor the maximum RAM of your computers >>> In my iMac with 4G RAM the following code works fine >>> . clear* >>> >>> . set mem 3g >>> (3145728k) >>> >>> . set obs 30000000 >>> obs was 0, now 30000000 >>> >>> . foreach i of numlist 1/15 { >>> 2. gen x`i'=rnormal() >>> 3. } >>> >>> . >>> end of do-file >>> >>> Abdel >>> >>> 2010/5/6 Nate Breznau <nbreznau@bigsss.uni-bremen.de>: >>> >>>> >>>> I am wanting to end my usage of SPSS, and in general have successfully >>>> >>> >>> done >>> >>>> >>>> so; however, in a project working with U.S. Census data I need to use a >>>> datafile that has over 30 million cases and 15 variables. This is the >>>> smallest version I can use for my purposes. Is there any way to alter >>>> >>> >>> Stata >>> >>>> >>>> to work with such a monster file? I've pushed it to its maximum allowed >>>> memory and its not enough. >>>> >>>> I thank anyone kindly for any advice, no matter how dismal. >>>> >>>> -Nate

