
From  "Krier, Betty" <Betty.Krier@oig.dot.gov> 
To  "'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu> 
Subject  st: Chisquare goodness of fit with grouped counts 
Date  Wed, 1 Mar 2006 10:04:37 0500 
This is actually a statistical question, rather than a programming one. I have data by markets (e.g. LA to NYC, CHI to LA, etc.) for numbers of flights cancelled in a given time period. For example, in market A there may be 400 cancellations, in market B 1327 cancellations, and so on. (I also have the total number of scheduled flights for each market in that same time period.)
I am interested in analyzing whether there is any significant pattern in the distribution of cancellations across short versus medium versus longdistance markets. I'm thinking that I want to use a chisquare goodness of fit test, comparing an expected distribution of cancellations across these market categories with what is observed. The problem is, I don't have standard frequency data in that I don't have data on individual flights; I have the number of cancellations by market.
At first, I thought that I could add up the numbers of cancellations in all shortdistance markets to get the observed number of shortdistance flight cancellations, and do similarly for the medium and longdistance markets. However, something about this doesn't seem right, and I get huge chisquare statistics if I do the calculations this way.
Is there a way to use a chisquare goodness of fit test in this context, and, if so, how should I account for my actual number of observations being equal to the number of markets and not the number of scheduled flights?
© Copyright 1996–2020 StataCorp LLC  Terms of use  Privacy  Contact us  What's new  Site index 