Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Summary statistics for panel data


From   John Kenny <[email protected]>
To   [email protected]
Subject   st: Summary statistics for panel data
Date   Thu, 8 Aug 2013 17:13:40 +0100

Dear Statalist

I'm relatively new to Stata and I cannot find a standard way too solve
my problem and I may need to write a .*do file. I'm dealing with a
very large data set that has about 20 variable that outlines horse
racing results with 711,000 observations.

These are some of the variables that are outlined in the data set.
For each race there is a variable that says where the race is held [
'Meeting' ], the date and time, ['date', 'time'] , the odds given for
the winning horse ['odds'], the race number at that meeting for a
given day ['race_no' ], whether the favourite won that race
['fav_win'] and the overround which is signifies the bookmakers profit
[ 'overround' ].  These variables are listed as follows:

meeting	date	        time	    odds fav_win overround race_no
Aintree	24-Oct-04	13:45	5	0	108.330	1
Aintree	24-Oct-04	14:15	3	0	106.053	2
Aintree	24-Oct-04	14:50	14	0	107.303	3
Aintree	24-Oct-04	15:20	1.5	1	106.933	4
Aintree	24-Oct-04	15:55	9	0	112.435	5
Aintree	24-Oct-04	16:30	1.88	0	116.008	6
Aintree	20-Nov-04	12:45	0.57	1	107.706	1
Aintree	20-Nov-04	13:20	2	1	107.996	2
Aintree	20-Nov-04	13:50	10	0	107.218	3
Aintree	20-Nov-04	14:20	7	1	119.689	4
Aintree	20-Nov-04	14:55	1.5	1	106.324	5
Aintree	20-Nov-04	15:25	0.33	1	105.149	6
						

This list is sorted by meeting date and race_no. There is numerous
meetings over a large time period. What I am trying to analyse is the
overround by getting the mean and standard deviation depending on the
outcomes of previous races at that meeting on a certain day. To be
more precise I would like to get mean and standard deviation of the
overround for each race depending on whether the favourite won some of
the previous races at that meeting on that date.

Examples of this would include getting the mean and standard deviation
of the overround for the fourth race (race_no==4) if the favourite won
(i.e fav_win==1) the first race (race_no==1) at that same meeting on
that day. Another example would be at a given race meeting on a
certain date if the favourite wins the first and the second race what
is the mean and standard deviation of the overround for the 4th, 5th
or 6th race.

What I have tried is using the summarize command and try and get the
mean and standard deviation of the 'overround' if 'race_no'==1 &
'favorite'==1. However every combination of variables I tried using it
always just got the mean of the 'overround' for race 1 if the
favourite won and not the mean of the second race or third race if the
favourite won the first race.

Any help would be greatly appreciated as I have been stuck on this for a while.

Thanks in advance.

John





Any Further help on this would be greatly appreciated.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index