Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# st: Multinomial logit model with selection

 From "T.Randazzo" To "statalist@hsphsun2.harvard.edu" Subject st: Multinomial logit model with selection Date Fri, 2 Mar 2012 16:45:08 +0000

```Dear Stata List,
I am trying to analyze how receiving remittances can affect the household expenditure behaviour in Senegal.
I have four types of household (HH_type)
HH_type:
1.       HH who do not receive remittances
2.       HH who receive remittances from national migrants
3.       HH who receive remittances from international migrants
4.       HH who receive remittances both from national and international migrants

I would like to investigate if differences exist in some specific expenditure (food, durable goods, education, health...)

The Model that I am trying to apply is a Multinomial logit model with selection as presented by Dubin and McFadden (1984) and revisited by Bourguignon, Fournier and Gurdand (2007).

The original DMF’s model [DMF(0)] is based on two assumptions: linearity assumption between the error term in the outcome equation and the error term in the choice equation; correlation coefficients between the two error terms sum up to zero.
The DMF’ model [DMF(1)] proposed by Bourguignon et al (2007) relaxes the second assumption
I am using the Selmlog command in Stata10.

When I consider DMF(0) I end up with 3 Mills’ ratio (M-1).
When I apply DMF(1) I end up with 4 Mills’ ratio

1)   How can I test if the restriction on the correlation parameters is correct?
2)   Passing from 3 to 4 Mills’ ratios how does the interpretation of that relevant coefficients change?
Model DMF(1):
Gen health1= health
Replace health1=. if HH_type !=1
selmlog health1 varlist, select (HH_type= varlist_m) dmf(1)bootstrap(100) gen(rh1_1)

Gen health2= health
Replace health2=. if HH_type !=2
selmlog health2 varlist, select (HH_type= varlist_m) dmf(1)bootstrap(100) gen(rh1_1)

Considering expenditure on health, I have found that for HH_type=1 rh1_1, rh1_2 and rh1_4 are insignificant while rh1_3 is significant. For HH_type=2 only rh2_2 is significant.
3)  How should I interpret those results?
I tried to compare the results obtained from the command selmlog with the following prestige:
a)   run a mlogit where the dependent variable is HH_type
b)  calculate the mills ratios

predict p1, outcome(1)
predict p2, outcome(2)
predict p3, outcome(3)
predict p4, outcome(4)

gen trnsp1=(p1*ln(p1))/(1-p1)
gen trnsp2=(p2*ln(p2))/(1-p2)
gen trnsp3=(p3*ln(p3))/(1-p3)
gen trnsp4=(p4*ln(p3))/(1-p4)

gen mills1= 4* ln(p1)+ trnsp2 + trnsp3 + trnsp4

gen mills2= 4* ln(p2)+ trnsp1 + trnsp3 + trnsp4

gen mills3= 4* ln(p3)+ trnsp1 + trnsp2 + trnsp4

gen mills4= 4* ln(p4)+ trnsp1 + trnsp2 + trnsp3

c)  Add the Mills’ ratios to the second step equation (we are considering expenditure on health)
reg health1 varlist mills1 mills2 mills3 mills4
reg health2 varlist mills1 mills2 mills3 mills4
reg health3 varlist mills1 mills2 mills3 mills4
reg health4 varlist mills1 mills2 mills3 mills4

4)  Does this prestige correspond to the one performed using the selmlog command? If it is, why don’t I get the same outcomes?

Your help to understand the model better would be very appreciate,

Sincerely,

Teresa Randazzo
PhD candidate, University of Kent
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```