[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: AW: Data management: looking up content in observations |

Date |
Mon, 2 Mar 2009 10:53:53 -0000 |

You could write an -egen- function for this. I am not aware of one. But that is not the only way to attack the problem, nor the most natural way. I don't think -by:- is natural here. There's more than instinct behind that statement, as it follows from the logic of the problem. The problem entails comparing observations in different blocks of observations, however "blocks" are defined. That is, the day will be different, the home team and guest team may both be different, etc. -by:-, conversely, is for problems in which you need only work _within_ blocks. As Martin pointed out, it helps to be thoroughly familiar with subscripting for this kind of problem. He didn't spell out the mundane details of any solution, so here is one way to do it. I fall back on the often-deprecated "loop over observations". It is not especially elegant or fast, but it is a direct attack on the problem and does work. There are probably more cunning solutions entailing -merge-s of the data with itself and so forth, but I'll still do it this way. gen winlast = . gen obsno = _n qui forval i = 1/`=_N' { su obsno if day == day[`i'] - 1 & /// (hometeam == hometeam[`i'] | guestteam == hometeam[`i']), meanonly if r(min) < . { replace winlast = (winner[r(min)] == hometeam[`i']) in `i' } } Notes: 1. I am assuming here that each team plays at most once per day. That is not explicit, but is suggested by Florian's data segment. 2. I am assuming that the total number of games in the dataset is modest enough to use a -float- for -obsno-. In a bigger dataset than that, specify that -obsno- is to be a -long-. 3. There are no games before the first, so the loop need not start at 1, but I'd rather leave it at 1 and let Stata do a little unnecessary work, rather than wire in 5 and then create a source of bugs if the data get out of -sort- order, or the code is ported to a different dataset for which 5 is no longer the correct number. 4. Florian hit the nail on the head in labelling this a "look up" problem. So, we can think of it in two stages: * Which observation contains the details for the previous game with this home team? * Did this home team win in that game? The first is, for observation `i', on the previous day and involves the same team as the present home team, either at home or away, and will be when this condition is satisfied: day == day[`i'] - 1 & /// (hometeam == hometeam[`i'] | guestteam == hometeam[`i']) What we do is exploit what -summarize- leaves in memory. At most one game should satisfy that condition, so that observation number will be recorded in multiple places, as r(min), r(max), r(mean) and r(sum). It is arbitrary which we use. (winner[r(min)] == hometeam[`i']) will be 1 if the home team for this game was the winner in that game, and 0 otherwise. 5. However, suppose that a team didn't play on the previous day. Then the -summarize- will return missing in r(min) and the comparison will be (winner[.] == hometeam[`i']) which will return 0, as -winner[.]- is evaluated as an empty string, which will not equal any team name. That's wrong, as the answer should be ., not 0. A similar issue arises with the first day's games. Thus, if r(min) < . { replace winlast = (winner[r(min)] == hometeam[`i']) in `i' } is the more careful code needed to trap such difficulties. 7. I assume that Florian meant count if day == 1 & winner == "F" but my solution does not depend on -winner- being string or numeric, just that -winner-, -hometeam-, -guestteam- are either all nmeric or all string. There is a discussion of related technique in SJ-6-4 dm0025 . . . . . . . . . . Stata tip 36: Which observations? Erratum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox Q4/06 SJ 6(4):596 (no commands) correction of example code for Stata tip 36 SJ-6-3 dm0025 . . . . . . . . . . . . . . Stata tip 36: Which observations? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox Q3/06 SJ 6(3):430--432 (no commands) tip for identifying which observations satisfy some specified condition Nick n.j.cox@durham.ac.uk Martin Weiss well, why do you want an -egen- function? Note there is an -egen, count- command already, which, in combination with -by-, might just do what you want. -help subscripting- may also be useful. Florian Kuhn I am trying to find out if in a league winning the previous game has an effect on the current game. Specifically, I have 8 teams, named A to H. I would like to construct the variable "winlast", being 1 if the current home team won the last game and 0 otherwise. The data is organized as follows: Day hometeam guestteam winner (winlast) 1 A H A (.) 1 C F . (.) 1 E B B (.) 1 G D D (.) 2 F E . (0) 2 B G G (1) 2 H C C (0) 2 D A D (1) 3 G E E (1) ... That is, for each observation I would like to check whether the home team is listed as "winner" for the previous day. I get the right digit by (for example) count if day == 1 & winner == F but I have no idea of how to incorporate this into an egen command (that is, I had a lot of ideas none of which worked). Does someone know how to get this right? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**st: Re: RE: AW: Data management: looking up content in observations***From:*"Florian Kuhn" <florian@mail.utexas.edu>

**References**:**st: Data management: looking up content in observations***From:*"Florian Kuhn" <florian@mail.utexas.edu>

**st: AW: Data management: looking up content in observations***From:*"Martin Weiss" <martin.weiss1@gmx.de>

- Prev by Date:
**RE: st: RE: Dropping var from log reg due to estimability** - Next by Date:
**st: Any other lists?** - Previous by thread:
**st: AW: Data management: looking up content in observations** - Next by thread:
**st: Re: RE: AW: Data management: looking up content in observations** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |