Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# RE: st: 5 mil obs - travel time btw 2 places

 From "Coleman, Greg" To "statalist@hsphsun2.harvard.edu" Subject RE: st: 5 mil obs - travel time btw 2 places Date Mon, 2 Dec 2013 19:43:34 +0000

```Thanks Nick - before I saw your note, I did try this;

sort origin dest

. egen avgtime=mean(crselapsedtime) if origin==origin[_n-1] & dest==dest[_n-1]
(3535 missing values generated)

BUT, the new var avgtime was the same for every single observation.

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox
Sent: Monday, December 02, 2013 1:33 PM
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: 5 mil obs - travel time btw 2 places

When you say "unique", you mean "distinct". On average, these "unique"
pairs occur about 25,000 times each, not once.

You end with the idea of a 'collapse'. Exactly!

As a footnote, look also at -groups- (SSC).

Nick
njcoxstata@gmail.com

On 2 December 2013 18:24, Coleman, Greg <greg.coleman@emc.com> wrote:
> Hi Stata gurus -
>
> A pretty large data set (for me!) where there are just over 5m obs. Its flight data, where there are 29 variables.
> 2 of the variables are origin, dest. I am struggling with coming up
> with various statistics when these 2 are the same, meaning all the rows where origin=JFK and dest=SFO. (example) For instance, count the number of times they occurred (how many flights from JFK to SFO overall), the travel time for each of the trips that occurred, which day of the week is typically prone to delays going to SFO from JFK, etc etc.
>
> Can someone give me a hint on how to approach this? I tried foreach loops, while loops, using "by()", but I feel like I am not on track to an efficient method.
> There are over 200 unique origin and dest throughout the 5m obs, so anyway I can 'collapse' this data so I can makes some graphs would also be great.
>
> Thanks!
> Greg
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```