Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: 5 mil obs - travel time btw 2 places

From   "Coleman, Greg" <>
To   "" <>
Subject   st: 5 mil obs - travel time btw 2 places
Date   Mon, 2 Dec 2013 18:24:03 +0000

Hi Stata gurus -

A pretty large data set (for me!) where there are just over 5m obs. Its flight data, where there are 29 variables.
2 of the variables are origin, dest. I am struggling with coming up with various statistics when these 2 are the same, meaning all the rows where origin=JFK and dest=SFO. (example)
For instance, count the number of times they occurred (how many flights from JFK to SFO overall), the travel time for each of the trips that occurred, which day of the week is typically prone to delays going to SFO from JFK, etc etc.

Can someone give me a hint on how to approach this? I tried foreach loops, while loops, using "by()", but I feel like I am not on track to an efficient method.
There are over 200 unique origin and dest throughout the 5m obs, so anyway I can 'collapse' this data so I can makes some graphs would also be great.


*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index