Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Reasons for varying OLS coefficients across different sub-samples.


From   "Gupta, Sumedha" <sugupta@iupui.edu>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   RE: st: Reasons for varying OLS coefficients across different sub-samples.
Date   Mon, 18 Jul 2011 21:25:44 +0000

Hi Fernando,
Thank you for your prompt response. I think you are quite right that the samples might just be two subgroups. I want to split the sample because I want to use the predictions from one half of the sample to test that in the other. I use the following code to create the sample:

gen sample=runiform()
gen sample1=1 if sample<.5
recode sample1 .=0
gen sample2=1 if sample>.5
recode sample2 .=0

But your e-mail makes me wonder if this is the right way to create a random sample given the survey design of my data. The survey over samples certain races and I wonder if my method of creating the random sample actually is inappropriate as it doesn't take into account the weights. Also, as you mentioned I want to consider three different groups individually (Whites, Blacks and Hispanic). So probably I need to do stratified sampling along these lines. Would you be able to suggest some suitable code for that? I will really appreciate your help.

Thank you so much.
Sumedha.



________________________________________
From: owner-statalist@hsphsun2.harvard.edu [owner-statalist@hsphsun2.harvard.edu] on behalf of Fernando Rios Avila [f.rios.a@gmail.com]
Sent: Monday, July 18, 2011 4:33 PM
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: Reasons for varying OLS coefficients across different sub-samples.

Hi Sumedha,
I believe that another reason could be that your subsamples are "too"
random. In other words, each sample is capturing too much of specific
subgroups, which might be reflected in your results. I would like to
know, however, why would you like to run your regression in a random
subsample instead that the full sample (except for memory issues). And
if you are doing so, perhaps you might choose to create random
stratified samples, to be sure your subsamples are representing the
whole universe of your data.
Fernando

On Mon, Jul 18, 2011 at 4:21 PM, Gupta, Sumedha <sugupta@iupui.edu> wrote:
> Dear All,
>
> I am running some OLS regressions on survey data (using svy) and seem to get very different results across different random sub-samples I draw from the sample. I believe two common reasons for this could be heteroscedasticity and outliers. Svy does not allow robust option after reg. Is there another way to test for why I am getting such different results for different sub-samples and more importantly is there a way of getting more robust results?
>
> All help will be much appreciated. Many thanks.
> Sincerely,
> Sumedha.
>
>
>
>
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index