Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Draw a random sample of my data...


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Draw a random sample of my data...
Date   Thu, 27 Sep 2012 15:37:14 +0100

First note that

sort countryID year

does nothing useful because you undo it by

by firmID, sort: gen firms = _n

Now focus on that last command. It will sort your data by -firmID- but
precisely which observation comes first within -firmID- is not
reproducible with that syntax.  So which observations are selected by

keep if firms == 1

may differ. Nothing that you do afterwards will undo that
indeterminacy. You can ensure consistency by e.g. -sort, stable-.

Here is a demo:

. sysuse auto, clear

. bysort rep78 : gen which = _n == 1

. levelsof make if which
`"AMC Spirit"' `"Cad. Deville"' `"Dodge St. Regis"' `"Pont. Firebird"'
`"Subaru"' `"VW Rabbit"'

. sysuse auto, clear
(1978 Automobile Data)

. bysort rep78 : gen which = _n == 1

. levelsof make if which
`"Buick Century"' `"Chev. Monte Carlo"' `"Ford Fiesta"' `"Honda
Accord"' `"Pont. Firebird"' `"Pont. Phoenix"'

Different -make-s come first.

. sysuse auto, clear
(1978 Automobile Data)

. sort rep78, stable

. by rep78 : gen which = _n == 1

. levelsof make if which
`"AMC Concord"' `"AMC Spirit"' `"Buick Electra"' `"Cad. Eldorado"'
`"Dodge Colt"' `"Olds Starfire"'

. sysuse auto, clear
(1978 Automobile Data)

. sort rep78, stable

. by rep78 : gen which = _n == 1

. levelsof make if which
`"AMC Concord"' `"AMC Spirit"' `"Buick Electra"' `"Cad. Eldorado"'
`"Dodge Colt"' `"Olds Starfire"'


Nick

Mehmet Altun

> I will code a subset of my data. I used the "sample"
> command..However, I would like to fix my random sample, so that I can
> generate the same sample again..For this I used the "set seed" command.
> However, if I rerun the dofile I get different samples in my random
> sample. Here is my dofile:
>
> clear;
> use all_data8;
> sort countryID year;
>
> by firmID, sort: gen firms = _n;
> keep if firms==1;
>
> by countryID, sort: egen countryfirms = total(firms);
>
> keep if countryID==244;
>
> set seed 260581;
>
> sample 63;
>
> save usfirms_1, replace;
>
>
>
> Is there a bug in stata, or what is wrong? Please help.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index