Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: mmregress question


From   bcoric@efst.hr
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: mmregress question
Date   Sat, 3 Dec 2011 11:16:57 +0100 (CET)

Thank you.

I figure out what was the problem.
Namely, I did not use command ?set seed- before -sregress- and -mmregress-.
In the case I select the same starting subsets, as you suggests, by ?set
seed-, I get identical results from run to run.
Moreover, when I exclude observations which are detected as outliers (by
implementation -mmregress (varlist), outlier graph label (varname)-) from
the sample, or include dummy variables for these observations than I get
very similar results for OLS (-regress-) and robust regression
(-mmregress-).

Thank you very much, once again.
Bruno


Bruno, perhaps you don't need -mmreg-  or -msreg-  and you can take
> another approach for your data.
>
> 1. Identify multivariate outliers in the predictor variables with the
> authors -mcd- command ("findit").  It also depends on p-subsets but
> there's no harm trying.
>
> 2. If you find observations clearly separate from the main group, they are
> potential high leverage points.  Take them out.
>
> 3. Now run -rreg- and -qreg- or, better, -bsqreg- on the reduced data set.
>  These do not depend on subset identification and without likely
> high-leverage points should have decent resistance properties
>
> Without knowing more about your data, seeing your commands, and exactly
> where iteration fails (in the estimation of the scale parameter or of the
> regression coefficients, it is difficult to say more.
>
> Good luck,
>
> Steve
>
>
>
>
>
>
> I should have pointed out, as you did not, that -mmregress- is a
> contributed package written by Verardi and Croux. It can be found by
> -findit-.
>
> I already stated how to get the identical results from run to run; reread
> my post.
>
> I  also mentioned that dummy variables will give -mmregress- and
> -sregress- trouble and stated that -msregress-, another part of the
> package, must be used.  Since you don't mention trying -mmregress-, I
> guess that you have no dummy variables.
>
>
> Other issues that could problems:
>
> * Too many predictors for your number of observations)
>
> In that case, try a reduced model
>
> * The model is misspecified
>
> I can think up data configurations where the wrong model, even for a
> single predictor, could cause -mmregress- and -ssregress- to be unstable.
> So you might, for example, try a fractional polynomial, instead of linear
> terms; or consider interactions (I know this conflicts with the parsimony
> recommendation).
>
>
> Lower on the list of causes of your problems:
>
> * Using too many or too few subsets.  The -help- shows how to vary the
> number.
>
> * Not using the latest version
>
>
> I've run -mmregress- repeatedly on several data sets of the same size as
> yours and have never encountered the instability you've seen.  So I
> suspect the problem is unique to your set-up, not "how to select
> appropriate starting subsample".
>
>
> Steve
>
>
>
>
>
>
>
> On Dec 2, 2011, at 2:56 AM, bcoric@efst.hr wrote:
>
> Than you Steve.
>
> I was using sregress and mmregress respectively as it stressed at the p.
> 444. However, results change significantly in each case when I repeat
> these two commands.
>
> You suggest that the reason is that the algorithm implemented in stata
> starts by randomly picking N subset of p observation.
> Could you please tell me how to set same N subsets before implement
> mmregress.
>
> Could you also tell me please do you have any recommendation how to select
> appropriate results. Namely, selection of different starting subset will
> probably result with different results again. If this is the case than,
> question is, how to select appropriate starting subsample?
>
> Bruno
>
>>
>> The FAQ ask that you give precise references; in this case you are
>> presumably referring to: Verardi, V., & Croux, C. (2009). Robust
>> regression in Stata. Stata Journal, 9(3), 439-453.
>>
>> The article states that -mmregress- starts with an initial S-estimator
>> and
>> (page 444) that
>>
>> "The algorithm implemented in Stata for computing the S-estimator starts
>> by randomly picking N subsets of p observation (defined as p-subset)
>> where
>> p is the number of regression parameters to estimate."
>>
>> In other words, unless the same seed is -set- before each instance of
>> -mmregress-,  one would expect the results to differ from run to run.
>>
>> Also, if there are indicator ("dummy") variables in your model,
>> -mmregress- will have problems (p 445). If you have dummies,  you should
>> instead run -msregress-, part of the same package, and list them only
>> with
>> the dummy() option.
>>
>> If you still are having difficulties, I suggest that you contact the
>> package's authors.
>>
>> Steve
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Dec 1, 2011, at 12:59 PM, bcoric@efst.hr wrote:
>>
>> I would like to ask for help with implementation of stata command
>> mmregress.
>> I am doing simple cross section analysis. Since I have just 48
>> observations I am very much concerned about the possible influence of
>> outliers.
>> Previously I was using stata command rreg for such chases. However, I
>> came
>> across the paper, Robust regression in Stata (2009), which argues that
>> rreg command does not have expected robustness properties and recommend
>> mmregress instead.
>> However, I faced some problems in its implementation. Namely, its
>> subsequent implementation to the same model leads to different values of
>> regression coefficients. Moreover, detected outliers also change.
>> I presume, that algorithm use iterative procedure taking previous
>> estimates as starting values. However, results are very different in
>> respect to the sign, size and significance of coefficients, and do not
>> converge after subsequent applications.
>> Does anyone can tell me is there anything, some procedure that I should
>> follow, to get robust (consistent) results.
>> Bruno Coric
>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index