[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: Nested logit with shares/grouped data |

Date |
Wed, 9 Nov 2005 12:29:36 -0000 |

I imagine that experts will be able to look at this helpfully. I just want to pick up on one incidental detail, as the point is of wider interest. You have code gen year=1 if family_id<=20 replace year=2 if family_id>20 & family_id<=40 replace year=3 if family_id>40 & family_id<=60 replace year=4 if family_id>60 & family_id<=80 replace year=5 if family_id>80 & family_id<=100 replace year=6 if family_id>100 & family_id<=120 replace year=7 if family_id>120 & family_id<=140 replace year=8 if family_id>140 & family_id<=160 replace year=9 if family_id>160 & family_id<=180 replace year=10 if family_id>180 & family_id<=200 replace year=11 if family_id>200 & family_id<=220 replace year=12 if family_id>220 & family_id<=240 replace year=13 if family_id>240 & family_id<=260 replace year=14 if family_id>260 & family_id<=280 replace year=15 if family_id>280 & family_id<=300 This could boil down to gen year = ceil(family_id/20) The tiny but useful trick here is that -ceil()-, short for ceiling, always rounds up to the next integer. -ceil()- has a sibling, -floor()-, which always rounds down. There is a long-winded excursus on this one point in SJ-3-4 dm0002 . . . . . . . . Stata tip 2: Building with floors and ceilings Q4/03 SJ 3(4):446--447 but the simple definition and memorable terminology (due to Kenneth E. Iverson) are sufficient to give this an edge over, say, solutions with -int()-. Nick n.j.cox@durham.ac.uk > -----Original Message----- > From: owner-statalist@hsphsun2.harvard.edu > [mailto:owner-statalist@hsphsun2.harvard.edu]On Behalf Of Peter Wright > Sent: 09 November 2005 10:07 > To: statalist@hsphsun2.harvard.edu > Subject: st: Nested logit with shares/grouped data > > > In response to Nick's comment I have added a bit more > explanation of what I have attempted below. To remind you of > my problem, the question is how do you estimate a nested > logit model in STATA when your left hand side variable takes > the form of a count (or a market share). i.e. the dataset > records how many sales of each product are made in each time period. > > The stata web site offers advice for a multinomial logit model: > > http://www.stata.com/support/faqs/stat/grouped.html > > This advice suggests first putting your data in "long" form > and then using frequency weights (fweights) with the mlogit > command. The question is, is such a procedure suitable in the > case of a nested logit model? > > To implement the model, I ran the following code: > > ***************************************************************** > * As an example I use the STATA restaurant data. > * However, I collapse it to make it look like a dataset of > shares/grouped data > * The collapsed dataset has information on the choices made > by 20 households > * regarding 7 restaurants. 15 yearly samples are taken > ***************************************************************** > > clear > use restaurant.dta > > gen year=1 if family_id<=20 > replace year=2 if family_id>20 & family_id<=40 > replace year=3 if family_id>40 & family_id<=60 > replace year=4 if family_id>60 & family_id<=80 > replace year=5 if family_id>80 & family_id<=100 > replace year=6 if family_id>100 & family_id<=120 > replace year=7 if family_id>120 & family_id<=140 > replace year=8 if family_id>140 & family_id<=160 > replace year=9 if family_id>160 & family_id<=180 > replace year=10 if family_id>180 & family_id<=200 > replace year=11 if family_id>200 & family_id<=220 > replace year=12 if family_id>220 & family_id<=240 > replace year=13 if family_id>240 & family_id<=260 > replace year=14 if family_id>260 & family_id<=280 > replace year=15 if family_id>280 & family_id<=300 > > collapse (sum) chosen (mean) income kids cost rating > distance, by(restaurant year) > rename chosen sales > > sort year > by year: egen total_sales=sum(sales) > gen market_share=sales/total_sales > > ************************************************************** > ************************ > * the dataset has information on sales (and sale-shares) as well > * as some explanatory variables > * There are 20 households choosing between 7 restaurants. > * The sample is repeated for 20 years. > * This is the kind of dataset that I had in mind. > * How would you run a nested logit model using such > shares/grouped data? > ************************************************************** > ************************ > * If we follow a similar methodology to that suggested by for > the multinomial model, > * we need to expand the data so that it has 7*7 rows for each year > expand 7 > sort year restaurant > > * number the choices 1 to 7 > egen alt_id=fill(1 2 3 4 5 6 7 1 2 3 4 5 6 7) > > * create an artificial chosen variable which is one for each > restaurant in turn (zero for the others) > sort year alt_id restaurant > gen chosen=0 > by year alt_id: replace chosen=1 if _n==alt_id > > * you also need a weighting variable to tell stata how many > times each restaurant was > * chosen (from the group of 7) > replace sales=. if chosen==0 > by year alt_id: egen sales2=mean(sales) > gen alt_id2=10*year+alt_id > > * You can see that the dataset now looks very much like one > based on individual data. > * The only difference is that the sample will be weighted by sales2 > > gen type=0 > replace type=1 if restaurant==1| restaurant==2 > replace type=2 if restaurant==3| restaurant==4| restaurant==5 > replace type=3 if restaurant==6| restaurant==7 > > * Now specify your nested logit model and run > gen incFast=(type==1)*income > gen incFancy=(type==3)*income > gen kidFast=(type==1)*kids > gen kidFancy=(type==3)*kids > > nlogit chosen (restaurant = cost rating distance) > (type=incFast incFancy kidFast kidFancy) [fweight=sales2], > group(alt_id2) > > ******************************************************************* > This procedure yields the following results: > > Nested logit regression > Levels = 2 Number of obs > = 2100 > Dependent variable = chosen LR chi2(10) > = -676.381 > Log likelihood = -513.32241 Prob > chi2 > = 1.0000 > > -------------------------------------------------------------- > ---------------- > | Coef. Std. Err. z P>|z| > [95% Conf. Interval] > -------------+------------------------------------------------ > ---------------- > restaurant | > cost | -.2347816 .1384955 -1.70 0.090 > -.5062277 .0366645 > rating | .3833214 .2482818 1.54 0.123 > -.1033021 .8699449 > distance | -.3779229 .2466483 -1.53 0.125 > -.8613448 .1054989 > -------------+------------------------------------------------ > ---------------- > type | > incFast | .0054128 .069671 0.08 0.938 > -.1311398 .1419654 > incFancy | .0715661 .0505795 1.41 0.157 > -.0275679 .1707 > kidFast | -.5918203 .6533741 -0.91 0.365 > -1.87241 .6887694 > kidFancy | -.6183388 .5423909 -1.14 0.254 > -1.681405 .4447279 > -------------+------------------------------------------------ > ---------------- > (incl. value | > parameters) | > type | > /type1 | 3.94913 3.423767 1.15 0.249 > -2.761329 10.65959 > /type2 | 2.633478 2.804631 0.94 0.348 > -2.863497 8.130453 > /type3 | 1.281784 .7357307 1.74 0.081 > -.1602222 2.723789 > -------------------------------------------------------------- > ---------------- > LR test of homoskedasticity (iv = 1): chi2(3)= -680.30 > Prob > chi2 = 1.0000 > -------------------------------------------------------------- > ---------------- > > In an attempt to check these results I checked them against > LIMDEP NLOGIT (which claims to be able to cope with > shares/grouped data) I get different results. > > Normal exit from iterations. Exit status=0. > +---------------------------------------------+ > | FIML: Nested Multinomial Logit Model | > | Maximum Likelihood Estimates | > | Dependent variable SALES | > | Weighting variable ONE | > | Number of observations 105 | > | Iterations completed 5 | > | Log likelihood function -524.2610 | > | Restricted log likelihood -592.6711 | > | Chi-squared 136.8201 | > | Degrees of freedom 10 | > | Significance level .0000000 | > | R2=1-LogL/LogL* Log-L fncn R-sqrd RsqAdj | > | No coefficients -592.6711 .11543 .00486 | > | Constants only. Must be computed directly. | > | Use NLOGIT ;...; RHS=ONE $ | > | At start values -527.7727 .00665 -.11751 | > | Response data are given as frequencies. | > +---------------------------------------------+ > > +---------------------------------------------+ > | FIML: Nested Multinomial Logit Model | > | The model has 2 levels. | > | Coefs. for branch level begin with B5 | > | Number of obs.= 15, skipped 0 bad obs. | > +---------------------------------------------+ > +---------+--------------+----------------+--------+---------+ > ----------+ > |Variable | Coefficient | Standard Error |b/St.Er.|P[|Z|>z] > | Mean of X| > +---------+--------------+----------------+--------+---------+ > ----------+ > Attributes in the Utility Functions > B2 -.1827804310 .22258969E-01 -8.212 .0000 > B3 .5317320087 .13437809 3.957 .0001 > B4 .5306557987 .13130855 4.041 .0001 > Attributes of Branch Choice Equations > B5 -.5031389578E-01 .94708096E-01 -.531 .5952 > B6 .6830782466 1.5458378 .442 .6586 > B7 .2425946290E-01 .44919700E-01 .540 .5892 > B8 -.4414619828 .66479462 -.664 .5067 > Inclusive Value Parameters > TYPE1 .9859433268 .25540989 3.860 .0001 > TYPE2 1.054564664 .25223344 4.181 .0000 > TYPE3 .9397231335 .27169252 3.459 .0005 > > Is this because you cannot proceed as I suggest above (or > because LIMDEP is wrong)? (Incidentally I think the stata > results are more likely to be correct as the t-ratios appear > too high in LIMDEP). > > > This message has been checked for viruses but the contents of > an attachment > may still contain software viruses, which could damage your > computer system: > you are advised to perform your own checks. Email > communications with the > University of Nottingham may be monitored as permitted by UK > legislation. > > > * > * For searches and help try: > * http://www.stata.com/support/faqs/res/findit.html > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**RE: st: How do you run the Waller-Duncan test in STATA?** - Next by Date:
**st: stratifying after regression** - Previous by thread:
**st: RE: Nested logit with shares/grouped data** - Next by thread:
**st: RE: Looping and replacing embedded text** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |