Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Multinomial logit model with selection

From	"T.Randazzo" <[email protected]>
To	"[email protected]" <[email protected]>
Subject	RE: st: Multinomial logit model with selection
Date	Mon, 5 Mar 2012 13:43:11 +0000

Dear Stas,

Yes, it is correct! There is no selmlog in Stata but the package (selmlog13.ado) is dowlodable at http://www.parisschoolofeconomics.com/gurgand-marc/selmlog/selmlog13.html and it works!
In their Monte Carlo experiment Bourguignon et al. (2007) find that the restriction on the correlation coefficients imposed in the original Durbin and McFadden (Econometrica 1984) can be waved to obtain more robust estimators. 

1)I would like to compare both methods DMF(0) and DMF(1): how can I test if the assumption on the correlation coefficients (they sum up to zero) is correctly specify? (if the assumption is correct I should prefer the original model otherwise I should choose the more flexible one).

Also, I would like to implement the model by myself:
1. Run a Multinomial logit model (type_HH is my dependent variable)
2. Calculate the inverse Mills ratio
3. Run a OLS regression where the dependent variable is expenditure in good i and include the mills ratios.
Because I have 4 outcomes, after running the mlogit I have to create the predict probabilities for each outcome:

predict p1, outcome(1)
predict p2, outcome(2)
predict p3, outcome(3)
predict p4, outcome(4)

Following the advice given by Mushfiq Mobarak (http://www.stata.com/statalist/archive/2003-04/msg00465.html)

The way to calculate the mills’ ratios and apply the Dubin and McFadden (1984) is the following
gen trnsp1=(p1*ln(p1))/(1-p1)
gen trnsp2=(p2*ln(p2))/(1-p2)
gen trnsp3=(p3*ln(p3))/(1-p3)
gen trnsp4=(p4*ln(p3))/(1-p4)
  
gen mills2= 3* ln(p2)+ trnsp1 + trnsp3 + trnsp4
 
gen mills3= 3* ln(p3)+ trnsp1 + trnsp2 + trnsp4
 
gen mills4= 3* ln(p4)+ trnsp1 + trnsp2 + trnsp3

2) What happen when I decide to apply the more flexible version of the Durbin and McFadden model? I should calculate 4 Mills ratios. Is it correct the following way?
 
gen mills1= 4* ln(p1)+ trnsp2 + trnsp3 + trnsp4
 
gen mills2= 4* ln(p2)+ trnsp1 + trnsp3 + trnsp4
 
gen mills3= 4* ln(p3)+ trnsp1 + trnsp2 + trnsp4
 
gen mills4= 4* ln(p4)+ trnsp1 + trnsp2 + trnsp3

3) Again, is there a way to test which model is more appropriate?


________________________________________
From: [email protected] [[email protected]] on behalf of Stas Kolenikov [[email protected]]
Sent: 02 March 2012 18:08
To: [email protected]
Subject: Re: st: Multinomial logit model with selection

Teresa,

cleanup issues in your post:

1. there is no -selmlog- in Stata world, as we know it. -findit
selmlog- returns a reference to -svyselmlog- on SSC. If a package is
not downloadable, it is nearly as good as non-existent. Without
knowing what -selmlog- produces, it is impossible to say how to
interpret its output.

2. References to the papers would be helpful. Especially if coupled
with links to full text or to RePEc, at least.

I can answer your question 2: I don't think any of the interpretation
changes. You are doing corrections in a different way, that's all.
What you called DMF(1) is more flexible, although not so internally
consistent compared to DMF(0), but as far as I can recall
Bourguignon's paper, it worked in a greater variety of settings.

On Fri, Mar 2, 2012 at 11:45 AM, T.Randazzo <[email protected]> wrote:
> Dear Stata List,
> I am trying to analyze how receiving remittances can affect the household expenditure behaviour in Senegal.
> I have four types of household (HH_type)
> HH_type:
> 1.       HH who do not receive remittances
> 2.       HH who receive remittances from national migrants
> 3.       HH who receive remittances from international migrants
> 4.       HH who receive remittances both from national and international migrants
>
> I would like to investigate if differences exist in some specific expenditure (food, durable goods, education, health...)
>
> The Model that I am trying to apply is a Multinomial logit model with selection as presented by Dubin and McFadden (1984) and revisited by Bourguignon, Fournier and Gurdand (2007).
>
> The original DMF’s model [DMF(0)] is based on two assumptions: linearity assumption between the error term in the outcome equation and the error term in the choice equation; correlation coefficients between the two error terms sum up to zero.
> The DMF’ model [DMF(1)] proposed by Bourguignon et al (2007) relaxes the second assumption
> I am using the Selmlog command in Stata10.
>
> When I consider DMF(0) I end up with 3 Mills’ ratio (M-1).
> When I apply DMF(1) I end up with 4 Mills’ ratio
>
> 1)   How can I test if the restriction on the correlation parameters is correct?
> 2)   Passing from 3 to 4 Mills’ ratios how does the interpretation of that relevant coefficients change?
> Model DMF(1):
> Gen health1= health
> Replace health1=. if HH_type !=1
> selmlog health1 varlist, select (HH_type= varlist_m) dmf(1)bootstrap(100) gen(rh1_1)
>
> Gen health2= health
> Replace health2=. if HH_type !=2
> selmlog health2 varlist, select (HH_type= varlist_m) dmf(1)bootstrap(100) gen(rh1_1)
>
> Considering expenditure on health, I have found that for HH_type=1 rh1_1, rh1_2 and rh1_4 are insignificant while rh1_3 is significant. For HH_type=2 only rh2_2 is significant.
> 3)  How should I interpret those results?
>  I tried to compare the results obtained from the command selmlog with the following prestige:
> a)   run a mlogit where the dependent variable is HH_type
> b)  calculate the mills ratios
>
> predict p1, outcome(1)
> predict p2, outcome(2)
> predict p3, outcome(3)
> predict p4, outcome(4)
>
> gen trnsp1=(p1*ln(p1))/(1-p1)
> gen trnsp2=(p2*ln(p2))/(1-p2)
> gen trnsp3=(p3*ln(p3))/(1-p3)
> gen trnsp4=(p4*ln(p3))/(1-p4)
>
> gen mills1= 4* ln(p1)+ trnsp2 + trnsp3 + trnsp4
>
> gen mills2= 4* ln(p2)+ trnsp1 + trnsp3 + trnsp4
>
> gen mills3= 4* ln(p3)+ trnsp1 + trnsp2 + trnsp4
>
> gen mills4= 4* ln(p4)+ trnsp1 + trnsp2 + trnsp3
>
> c)  Add the Mills’ ratios to the second step equation (we are considering expenditure on health)
> reg health1 varlist mills1 mills2 mills3 mills4
> reg health2 varlist mills1 mills2 mills3 mills4
> reg health3 varlist mills1 mills2 mills3 mills4
> reg health4 varlist mills1 mills2 mills3 mills4
>
> 4)  Does this prestige correspond to the one performed using the selmlog command? If it is, why don’t I get the same outcomes?
>
> Your help to understand the model better would be very appreciate,
>
> Sincerely,
>
> Teresa Randazzo
> PhD candidate, University of Kent
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/



--
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Multinomial logit model with selection
  - From: "T.Randazzo" <[email protected]>
- Re: st: Multinomial logit model with selection
  - From: Stas Kolenikov <[email protected]>

Prev by Date: st: post-hoc tests - comparing means across groups when variances are unequal
Next by Date: RE: st: nbreg - problem with constant?
Previous by thread: Re: st: Multinomial logit model with selection
Next by thread: st: Repeated posts
Index(es):
- Date
- Thread