Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Nested logit with shares/grouped data


From   "Peter Wright" <Peter.Wright@nottingham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: Nested logit with shares/grouped data
Date   Wed, 09 Nov 2005 10:07:19 +0000

In response to Nick's comment I have added a bit more explanation of what I have attempted below. To remind you of my problem, the question is how do you estimate a nested logit model in STATA when your left hand side variable takes the form of a count (or a market share). i.e. the dataset records how many sales of each product are made in each time period. 

The stata web site offers advice for a multinomial logit model: 

http://www.stata.com/support/faqs/stat/grouped.html 

This advice suggests first putting your data in "long" form and then using frequency weights (fweights) with the mlogit command. The question is, is such a procedure suitable in the case of a nested logit model?

To implement the model, I ran the following code:

*****************************************************************
* As an example I use the STATA restaurant data.
* However, I collapse it to make it look like a dataset of shares/grouped data
* The collapsed dataset has information on the choices made by 20 households
* regarding 7 restaurants. 15 yearly samples are taken
*****************************************************************

clear
use restaurant.dta

gen year=1 if family_id<=20
replace year=2 if family_id>20 & family_id<=40
replace year=3 if family_id>40 & family_id<=60
replace year=4 if family_id>60 & family_id<=80
replace year=5 if family_id>80 & family_id<=100
replace year=6 if family_id>100 & family_id<=120
replace year=7 if family_id>120 & family_id<=140
replace year=8 if family_id>140 & family_id<=160
replace year=9 if family_id>160 & family_id<=180
replace year=10 if family_id>180 & family_id<=200
replace year=11 if family_id>200 & family_id<=220
replace year=12 if family_id>220 & family_id<=240
replace year=13 if family_id>240 & family_id<=260
replace year=14 if family_id>260 & family_id<=280
replace year=15 if family_id>280 & family_id<=300

collapse (sum) chosen (mean) income kids cost rating distance, by(restaurant year)
rename chosen sales

sort year
by year: egen total_sales=sum(sales)
gen market_share=sales/total_sales

**************************************************************************************
* the dataset has information on sales (and sale-shares) as well 
* as some explanatory variables
* There are 20 households choosing between 7 restaurants. 
* The sample is repeated for 20 years.
* This is the kind of  dataset that I had in mind. 
* How would you run a nested logit model using such shares/grouped data?
**************************************************************************************
* If we follow a similar methodology to that suggested by for the multinomial model,
* we need to expand the data so that it has 7*7 rows for each year
expand 7
sort year restaurant

* number the choices 1 to 7
egen alt_id=fill(1 2 3 4 5 6 7 1 2 3 4 5 6 7)

* create an artificial chosen variable which is one for each restaurant in turn (zero for the others)
sort year alt_id restaurant
gen chosen=0
by year alt_id: replace chosen=1 if _n==alt_id

* you also need a weighting variable to tell stata how many times each restaurant was 
* chosen (from the group of 7)
replace sales=. if chosen==0
by year alt_id: egen sales2=mean(sales)
gen alt_id2=10*year+alt_id

* You can see that the dataset now looks very much like one based on individual data. 
* The only difference is that the sample will be weighted by sales2

gen type=0
replace type=1 if restaurant==1| restaurant==2
replace type=2 if restaurant==3| restaurant==4| restaurant==5
replace type=3 if restaurant==6| restaurant==7

* Now specify your nested logit model and run
gen incFast=(type==1)*income
gen incFancy=(type==3)*income
gen kidFast=(type==1)*kids
gen kidFancy=(type==3)*kids

nlogit chosen (restaurant = cost rating distance) (type=incFast incFancy kidFast kidFancy) [fweight=sales2], group(alt_id2) 

*******************************************************************
This procedure yields the following results:

Nested logit regression
Levels             =          2                 Number of obs      =      2100
Dependent variable =     chosen                 LR chi2(10)        =  -676.381
Log likelihood     = -513.32241                 Prob > chi2        =    1.0000

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
restaurant   |
        cost |  -.2347816   .1384955    -1.70   0.090    -.5062277    .0366645
      rating |   .3833214   .2482818     1.54   0.123    -.1033021    .8699449
    distance |  -.3779229   .2466483    -1.53   0.125    -.8613448    .1054989
-------------+----------------------------------------------------------------
type         |
     incFast |   .0054128    .069671     0.08   0.938    -.1311398    .1419654
    incFancy |   .0715661   .0505795     1.41   0.157    -.0275679       .1707
     kidFast |  -.5918203   .6533741    -0.91   0.365     -1.87241    .6887694
    kidFancy |  -.6183388   .5423909    -1.14   0.254    -1.681405    .4447279
-------------+----------------------------------------------------------------
(incl. value |
 parameters) |
type         |
      /type1 |    3.94913   3.423767     1.15   0.249    -2.761329    10.65959
      /type2 |   2.633478   2.804631     0.94   0.348    -2.863497    8.130453
      /type3 |   1.281784   .7357307     1.74   0.081    -.1602222    2.723789
------------------------------------------------------------------------------
LR test of homoskedasticity (iv = 1): chi2(3)= -680.30    Prob > chi2 = 1.0000
------------------------------------------------------------------------------

In an attempt to check these results I checked them against LIMDEP NLOGIT (which claims to be able to cope with shares/grouped data) I get different results.

Normal exit from iterations. Exit status=0.
              +---------------------------------------------+
              | FIML: Nested Multinomial Logit Model        |
              | Maximum Likelihood Estimates                |
              | Dependent variable                SALES     |
              | Weighting variable                  ONE     |
              | Number of observations              105     |
              | Iterations completed                  5     |
              | Log likelihood function       -524.2610     |
              | Restricted log likelihood     -592.6711     |
              | Chi-squared                    136.8201     |
              | Degrees of freedom                   10     |
              | Significance level             .0000000     |
              | R2=1-LogL/LogL*  Log-L fncn  R-sqrd  RsqAdj |
              | No coefficients   -592.6711  .11543  .00486 |
              | Constants only.  Must be computed directly. |
              |                  Use NLOGIT ;...; RHS=ONE $ |
              | At start values   -527.7727  .00665 -.11751 |
              | Response data are given as frequencies.     |
              +---------------------------------------------+

              +---------------------------------------------+
              | FIML: Nested Multinomial Logit Model        |
              | The model has 2 levels.                     |
              | Coefs. for branch level begin with B5       |
              | Number of obs.=    15, skipped   0 bad obs. |
              +---------------------------------------------+
+---------+--------------+----------------+--------+---------+----------+
|Variable | Coefficient  | Standard Error |b/St.Er.|P[|Z|>z] | Mean of X|
+---------+--------------+----------------+--------+---------+----------+
          Attributes in the Utility Functions
 B2       -.1827804310      .22258969E-01   -8.212   .0000
 B3        .5317320087      .13437809        3.957   .0001
 B4        .5306557987      .13130855        4.041   .0001
          Attributes of Branch Choice Equations
 B5       -.5031389578E-01  .94708096E-01    -.531   .5952
 B6        .6830782466      1.5458378         .442   .6586
 B7        .2425946290E-01  .44919700E-01     .540   .5892
 B8       -.4414619828      .66479462        -.664   .5067
          Inclusive Value Parameters
 TYPE1     .9859433268      .25540989        3.860   .0001
 TYPE2     1.054564664      .25223344        4.181   .0000
 TYPE3     .9397231335      .27169252        3.459   .0005

Is this because you cannot proceed as I suggest above (or because LIMDEP is wrong)? (Incidentally I think the stata results are more likely to be correct as the t-ratios appear too high in LIMDEP).


This message has been checked for viruses but the contents of an attachment
may still contain software viruses, which could damage your computer system:
you are advised to perform your own checks. Email communications with the
University of Nottingham may be monitored as permitted by UK legislation.


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index