Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Draw a random sample of my data...

From	[email protected]
To	[email protected]
Subject	Re: st: Draw a random sample of my data...
Date	Thu, 4 Oct 2012 01:46:16 +0200 (CEST)

Thank you very much Nick for your answer. The "stable" option
helped solving my problem. However a new question emerged:
I have a
little problem with generating a new dataset. I first use the command
"sample" and "set seed" to generate a new  dataset.
But I still have problemswith integrating my random sample dataset within
the original  paneldata. The reason is that US firms account for more than
50% of the  dataset, this affects the cross-country results very strong.
However,  with respect to the world wide industry business volume, US
firms  account 29%. Therefore, I draw a random sample, in which I randomly
 account 29% of the US firms in the dataset. I have a panel data with
countryID firmID and years. After running the random sample and setting
the seeds, I would like to merge the randomly generated dataset of US
firms (with random firmID and random years) with my original panel data
(with countryID firmID and years). But: how can I merge the dataset in
which only the random sample of US firms is considered (for additional
years within the original paneldataset) and the other US fimrs are
dropped. How can I genetrate a variable, in which I can say that only
"the random" US firms can be considered within the original
panel  dataset for all years?
 Please help..Thank you in
advance...Mehmet Altun

 My commands look like:
 use
all_data8;

 by firmID, sort: gen firms = _n;
 keep if
firms==1;

 keep if countryID==244 (USA);
 sort firmID,
stable;
 set seed 260581;

 sample 63;
 sort year;
 save usfirms_1, replace;

> First note that
>
> sort countryID year
>
> does nothing useful because you undo it by
>
> by firmID, sort: gen firms = _n
>
> Now focus on that last command. It will sort your data by -firmID- but
> precisely which observation comes first within -firmID- is not
> reproducible with that syntax.  So which observations are selected by
>
> keep if firms == 1
>
> may differ. Nothing that you do afterwards will undo that
> indeterminacy. You can ensure consistency by e.g. -sort, stable-.
>
> Here is a demo:
>
> . sysuse auto, clear
>
> . bysort rep78 : gen which = _n == 1
>
> . levelsof make if which
> `"AMC Spirit"' `"Cad. Deville"' `"Dodge St. Regis"' `"Pont. Firebird"'
> `"Subaru"' `"VW Rabbit"'
>
> . sysuse auto, clear
> (1978 Automobile Data)
>
> . bysort rep78 : gen which = _n == 1
>
> . levelsof make if which
> `"Buick Century"' `"Chev. Monte Carlo"' `"Ford Fiesta"' `"Honda
> Accord"' `"Pont. Firebird"' `"Pont. Phoenix"'
>
> Different -make-s come first.
>
> . sysuse auto, clear
> (1978 Automobile Data)
>
> . sort rep78, stable
>
> . by rep78 : gen which = _n == 1
>
> . levelsof make if which
> `"AMC Concord"' `"AMC Spirit"' `"Buick Electra"' `"Cad. Eldorado"'
> `"Dodge Colt"' `"Olds Starfire"'
>
> . sysuse auto, clear
> (1978 Automobile Data)
>
> . sort rep78, stable
>
> . by rep78 : gen which = _n == 1
>
> . levelsof make if which
> `"AMC Concord"' `"AMC Spirit"' `"Buick Electra"' `"Cad. Eldorado"'
> `"Dodge Colt"' `"Olds Starfire"'
>
>
> Nick
>
> Mehmet Altun
>
>> I will code a subset of my data. I used the "sample"
>> command..However, I would like to fix my random sample, so that I can
>> generate the same sample again..For this I used the "set seed" command.
>> However, if I rerun the dofile I get different samples in my random
>> sample. Here is my dofile:
>>
>> clear;
>> use all_data8;
>> sort countryID year;
>>
>> by firmID, sort: gen firms = _n;
>> keep if firms==1;
>>
>> by countryID, sort: egen countryfirms = total(firms);
>>
>> keep if countryID==244;
>>
>> set seed 260581;
>>
>> sample 63;
>>
>> save usfirms_1, replace;
>>
>>
>>
>> Is there a bug in stata, or what is wrong? Please help.
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: Draw a random sample of my data...
  - From: Nick Cox <[email protected]>

Prev by Date: Re: st: Observations in Sequence analysis
Next by Date: st: Using the predictnl command following a model containing restricted cubic splines and time-dependent effects
Previous by thread: st: Clustering variables instead of cases
Next by thread: Re: st: Draw a random sample of my data...
Index(es):
- Date
- Thread