[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: A note on -sort- order, especially panel data |

Date |
Thu, 12 Sep 2002 12:15:08 +0100 |

Nisha Malhotra posted a panel data problem which attracted a flurry of overlapping answers, from which an acceptable solution should emerge, once Nisha has sorted out whether the jump should take place at or after the first action and what is appropriate for the very first value in a panel (for which previous conditions are unknown, at least to Stata). I want to expand on a point arising which is much more general and can bite you (and you won't always notice). Let's abstract to a structure of panel identifier id and time variable time The problem is with code like this: . sort id . by id : gen <whatever> which Stata 7 users can happily telescope to . bysort id : gen <whatever> The way this arises is that (1) you want to do something separately for each panel and (2) you know that Stata requires a prior -sort- for that, so you oblige. (More than courtesy here: it's the law.) What's tricky is that the code often should be . sort id time . by id : gen <whatever> or the equivalent . bysort id (time) : gen <whatever> -- whenever, that is, you also want observations within each panel to be in time order. Even when correct within-panel order is irrelevant to what you want, as when say you are computing means, it rarely does any harm. What underlies all this is the literal-mindedness of Stata, which does what you say, not what you mean. Given the instruction . sort id Stata will be satisfied with _any_ ordering of observations for which -id- is sorted, and there are usually lots of possibilities, as some combinatorial calculations will confirm. Stata does not care about any other point. Indeed, having done what you want, it sits there smirking. Now it is often the case in practice that panel data will come in order of -id- and then -time-, or will be left that way after a previous command. And, increasingly, it is a standard that Stata commands should not change the -sort- order of your data unless you explicitly specify that or it is among the purposes of a command. So no harm may ensue. But -- as said, and this is the crunch -- Stata makes absolutely no promises about order of observations within each block defined by -id- (or within any other varlist given as argument to -sort-). So there is a possibility that operations dependent on within-panel order will give incorrect results. In the problem here, operations based on the -sum()- function are a case in point. With panel data there is another and in many ways a better approach. -tsset- your data and use time-series operators. Then given some initial . tsset id time any later . tsset will automatically return panel data to the correct sort order, so that . by id: ... is then guaranteed to work on the correct within-panel order. In addition, Stata refuses to do calculations based on operators such as L. unless data are in the correct sort order, providing for you a safety catch. Conversely, for operators like L. you don't need to specify separate calculations within panels: that is done automatically given a -tsset- to panel data. -sum()-, however, has nothing to do with time series as such. It long predates specific time series syntax in Stata and indeed stands outside that framework. Nick n.j.cox@durham.ac.uk * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: RE: bootstrap statistic** - Next by Date:
**Re: st: svmat to different datasheet** - Previous by thread:
**st: RE: bootstrap statistic** - Next by thread:
**Re: st: A note on -sort- order, especially panel data** - Index(es):

© Copyright 1996–2017 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |