# st: RE: Nested logit with shares/grouped data

 From "Nick Cox" To Subject st: RE: Nested logit with shares/grouped data Date Wed, 9 Nov 2005 12:29:36 -0000

```I imagine that experts will be able to look
at this helpfully. I just want to pick up
on one incidental detail, as the point
is of wider interest. You have code

gen year=1 if family_id<=20
replace year=2 if family_id>20 & family_id<=40
replace year=3 if family_id>40 & family_id<=60
replace year=4 if family_id>60 & family_id<=80
replace year=5 if family_id>80 & family_id<=100
replace year=6 if family_id>100 & family_id<=120
replace year=7 if family_id>120 & family_id<=140
replace year=8 if family_id>140 & family_id<=160
replace year=9 if family_id>160 & family_id<=180
replace year=10 if family_id>180 & family_id<=200
replace year=11 if family_id>200 & family_id<=220
replace year=12 if family_id>220 & family_id<=240
replace year=13 if family_id>240 & family_id<=260
replace year=14 if family_id>260 & family_id<=280
replace year=15 if family_id>280 & family_id<=300

This could boil down to

gen year = ceil(family_id/20)

The tiny but useful trick here is that -ceil()-, short
for ceiling, always rounds up to the next integer.
-ceil()- has a sibling, -floor()-, which always rounds
down.

There is a long-winded excursus on this one point in

SJ-3-4  dm0002  . . . . . . . . Stata tip 2: Building with floors and ceilings
Q4/03   SJ 3(4):446--447

but the simple definition and memorable terminology (due to Kenneth E.
Iverson) are sufficient to give this an edge over, say,
solutions with -int()-.

Nick
n.j.cox@durham.ac.uk

> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu
> [mailto:owner-statalist@hsphsun2.harvard.edu]On Behalf Of Peter Wright
> Sent: 09 November 2005 10:07
> To: statalist@hsphsun2.harvard.edu
> Subject: st: Nested logit with shares/grouped data
>
>
> In response to Nick's comment I have added a bit more
> explanation of what I have attempted below. To remind you of
> my problem, the question is how do you estimate a nested
> logit model in STATA when your left hand side variable takes
> the form of a count (or a market share). i.e. the dataset
> records how many sales of each product are made in each time period.
>
> The stata web site offers advice for a multinomial logit model:
>
> http://www.stata.com/support/faqs/stat/grouped.html
>
> This advice suggests first putting your data in "long" form
> and then using frequency weights (fweights) with the mlogit
> command. The question is, is such a procedure suitable in the
> case of a nested logit model?
>
> To implement the model, I ran the following code:
>
> *****************************************************************
> * As an example I use the STATA restaurant data.
> * However, I collapse it to make it look like a dataset of
> shares/grouped data
> * The collapsed dataset has information on the choices made
> by 20 households
> * regarding 7 restaurants. 15 yearly samples are taken
> *****************************************************************
>
> clear
> use restaurant.dta
>
> gen year=1 if family_id<=20
> replace year=2 if family_id>20 & family_id<=40
> replace year=3 if family_id>40 & family_id<=60
> replace year=4 if family_id>60 & family_id<=80
> replace year=5 if family_id>80 & family_id<=100
> replace year=6 if family_id>100 & family_id<=120
> replace year=7 if family_id>120 & family_id<=140
> replace year=8 if family_id>140 & family_id<=160
> replace year=9 if family_id>160 & family_id<=180
> replace year=10 if family_id>180 & family_id<=200
> replace year=11 if family_id>200 & family_id<=220
> replace year=12 if family_id>220 & family_id<=240
> replace year=13 if family_id>240 & family_id<=260
> replace year=14 if family_id>260 & family_id<=280
> replace year=15 if family_id>280 & family_id<=300
>
> collapse (sum) chosen (mean) income kids cost rating
> distance, by(restaurant year)
> rename chosen sales
>
> sort year
> by year: egen total_sales=sum(sales)
> gen market_share=sales/total_sales
>
> **************************************************************
> ************************
> * the dataset has information on sales (and sale-shares) as well
> * as some explanatory variables
> * There are 20 households choosing between 7 restaurants.
> * The sample is repeated for 20 years.
> * This is the kind of  dataset that I had in mind.
> * How would you run a nested logit model using such
> shares/grouped data?
> **************************************************************
> ************************
> * If we follow a similar methodology to that suggested by for
> the multinomial model,
> * we need to expand the data so that it has 7*7 rows for each year
> expand 7
> sort year restaurant
>
> * number the choices 1 to 7
> egen alt_id=fill(1 2 3 4 5 6 7 1 2 3 4 5 6 7)
>
> * create an artificial chosen variable which is one for each
> restaurant in turn (zero for the others)
> sort year alt_id restaurant
> gen chosen=0
> by year alt_id: replace chosen=1 if _n==alt_id
>
> * you also need a weighting variable to tell stata how many
> times each restaurant was
> * chosen (from the group of 7)
> replace sales=. if chosen==0
> by year alt_id: egen sales2=mean(sales)
> gen alt_id2=10*year+alt_id
>
> * You can see that the dataset now looks very much like one
> based on individual data.
> * The only difference is that the sample will be weighted by sales2
>
> gen type=0
> replace type=1 if restaurant==1| restaurant==2
> replace type=2 if restaurant==3| restaurant==4| restaurant==5
> replace type=3 if restaurant==6| restaurant==7
>
> * Now specify your nested logit model and run
> gen incFast=(type==1)*income
> gen incFancy=(type==3)*income
> gen kidFast=(type==1)*kids
> gen kidFancy=(type==3)*kids
>
> nlogit chosen (restaurant = cost rating distance)
> (type=incFast incFancy kidFast kidFancy) [fweight=sales2],
> group(alt_id2)
>
> *******************************************************************
> This procedure yields the following results:
>
> Nested logit regression
> Levels             =          2                 Number of obs
>      =      2100
> Dependent variable =     chosen                 LR chi2(10)
>      =  -676.381
> Log likelihood     = -513.32241                 Prob > chi2
>      =    1.0000
>
> --------------------------------------------------------------
> ----------------
>              |      Coef.   Std. Err.      z    P>|z|
> [95% Conf. Interval]
> -------------+------------------------------------------------
> ----------------
> restaurant   |
>         cost |  -.2347816   .1384955    -1.70   0.090
> -.5062277    .0366645
>       rating |   .3833214   .2482818     1.54   0.123
> -.1033021    .8699449
>     distance |  -.3779229   .2466483    -1.53   0.125
> -.8613448    .1054989
> -------------+------------------------------------------------
> ----------------
> type         |
>      incFast |   .0054128    .069671     0.08   0.938
> -.1311398    .1419654
>     incFancy |   .0715661   .0505795     1.41   0.157
> -.0275679       .1707
>      kidFast |  -.5918203   .6533741    -0.91   0.365
> -1.87241    .6887694
>     kidFancy |  -.6183388   .5423909    -1.14   0.254
> -1.681405    .4447279
> -------------+------------------------------------------------
> ----------------
> (incl. value |
>  parameters) |
> type         |
>       /type1 |    3.94913   3.423767     1.15   0.249
> -2.761329    10.65959
>       /type2 |   2.633478   2.804631     0.94   0.348
> -2.863497    8.130453
>       /type3 |   1.281784   .7357307     1.74   0.081
> -.1602222    2.723789
> --------------------------------------------------------------
> ----------------
> LR test of homoskedasticity (iv = 1): chi2(3)= -680.30
> Prob > chi2 = 1.0000
> --------------------------------------------------------------
> ----------------
>
> In an attempt to check these results I checked them against
> LIMDEP NLOGIT (which claims to be able to cope with
> shares/grouped data) I get different results.
>
> Normal exit from iterations. Exit status=0.
>               +---------------------------------------------+
>               | FIML: Nested Multinomial Logit Model        |
>               | Maximum Likelihood Estimates                |
>               | Dependent variable                SALES     |
>               | Weighting variable                  ONE     |
>               | Number of observations              105     |
>               | Iterations completed                  5     |
>               | Log likelihood function       -524.2610     |
>               | Restricted log likelihood     -592.6711     |
>               | Chi-squared                    136.8201     |
>               | Degrees of freedom                   10     |
>               | Significance level             .0000000     |
>               | R2=1-LogL/LogL*  Log-L fncn  R-sqrd  RsqAdj |
>               | No coefficients   -592.6711  .11543  .00486 |
>               | Constants only.  Must be computed directly. |
>               |                  Use NLOGIT ;...; RHS=ONE \$ |
>               | At start values   -527.7727  .00665 -.11751 |
>               | Response data are given as frequencies.     |
>               +---------------------------------------------+
>
>               +---------------------------------------------+
>               | FIML: Nested Multinomial Logit Model        |
>               | The model has 2 levels.                     |
>               | Coefs. for branch level begin with B5       |
>               | Number of obs.=    15, skipped   0 bad obs. |
>               +---------------------------------------------+
> +---------+--------------+----------------+--------+---------+
> ----------+
> |Variable | Coefficient  | Standard Error |b/St.Er.|P[|Z|>z]
> | Mean of X|
> +---------+--------------+----------------+--------+---------+
> ----------+
>           Attributes in the Utility Functions
>  B2       -.1827804310      .22258969E-01   -8.212   .0000
>  B3        .5317320087      .13437809        3.957   .0001
>  B4        .5306557987      .13130855        4.041   .0001
>           Attributes of Branch Choice Equations
>  B5       -.5031389578E-01  .94708096E-01    -.531   .5952
>  B6        .6830782466      1.5458378         .442   .6586
>  B7        .2425946290E-01  .44919700E-01     .540   .5892
>  B8       -.4414619828      .66479462        -.664   .5067
>           Inclusive Value Parameters
>  TYPE1     .9859433268      .25540989        3.860   .0001
>  TYPE2     1.054564664      .25223344        4.181   .0000
>  TYPE3     .9397231335      .27169252        3.459   .0005
>
> Is this because you cannot proceed as I suggest above (or
> because LIMDEP is wrong)? (Incidentally I think the stata
> results are more likely to be correct as the t-ratios appear
> too high in LIMDEP).
>
>
> This message has been checked for viruses but the contents of
> an attachment
> may still contain software viruses, which could damage your
> computer system:
> you are advised to perform your own checks. Email
> communications with the
> University of Nottingham may be monitored as permitted by UK
> legislation.
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```