st: RE: elementary panel data management question

Fri, 22 Apr 2005 15:53:19 +0100

bysort year (profit) : gen high = (_N - _n) <= 4 If -profit- is ever missing, you need gen OK = profit < . bysort OK year (profit) : gen high = OK * ((_N - _n) <= 4) Within each block of -year-, -sort- on -profit-. The company with the highest -profit- is then last, such that _N == _n and _N - _n == 0. The second highest profit is then such that _N - 1 = _n and _N - _n == 1. And so third. And so forth. This all hinges on the fact that under -by:- _n and _N are defined within groups. More at How to move step by: step. Stata Journal 2(1):86-102 (2002) (explains the use of the by varlist : construct to tackle a variety of problems with group structure, ranging from simple calculations for each of several groups to more advanced manipulations that use the built-in _n and _N) The total profit for those companies is then given by sum profit if high Nick n.j.cox@durham.ac.uk Crystal Lopez > My first 2 questions to the stata list: > > I have a large panel dataset, with entries for each > company and for each year. In other words, I have one > variable called "company" and one variable called > "year", so that I have one observation for each > company for each year, and each observation has > several other variables. > > Basically what I want to do is to identify the 5 > companies that have the highest profits for each year. > I then want to create a dummy variable (call it top5) > which indicates, for each observation, whether that > company for that year is one of the 5 most profitable. > The 5 would be of course tend to be different for > every year. I would end up with a variable which is 1 > if that company is among the 5 most profitable for > that year, 0 otherwise. (I would then like to do this > for top 10 and top 20 as well, but I guess I can > figure that out once I have the above). > > The reason that I want to create such a variable is > that I am doing panel data regressions and one of the > independent variables that I want to throw in is a > dummy like this, to indicate whether or not a given > company is among the top 5 for that year. I also want > to be able to exclude from my regression any company > that is among the top 5 in a given year. > > A second, related question is how I can get the total > profits of the top 5 banks in every year. I guess once > I have created the dummy this shouldn't be too > difficult - probably I can get some kind of table that > sums up the profit variable for the top 5 (ie where > the dummy=1) by year?? > * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

