# st: RE: elementary panel data management question

 From "Nick Cox" To Subject st: RE: elementary panel data management question Date Fri, 22 Apr 2005 15:53:19 +0100

```bysort year (profit) : gen high = (_N - _n) <= 4

If -profit- is ever missing, you need

gen OK = profit < .
bysort OK year (profit) : gen high = OK * ((_N - _n) <= 4)

Within each block of -year-, -sort- on -profit-.

The company with the highest -profit- is then last,
such that _N == _n and _N - _n == 0.

The second highest profit is then such that _N - 1 = _n
and _N - _n == 1.

And so third. And so forth.

This all hinges on the fact that under -by:- _n and _N
are defined within groups. More at

How to move step by: step. Stata Journal 2(1):86-102
(2002)

(explains the use of the by varlist : construct to tackle
a variety of problems with group structure, ranging from
simple calculations for each of several groups to more
advanced manipulations that use the built-in _n and _N)

The total profit for those companies is then given by

sum profit if high

Nick
n.j.cox@durham.ac.uk

Crystal Lopez

> My first 2 questions to the stata list:
>
> I have a large panel dataset, with entries for each
> company and for each year. In other words, I have one
> variable called "company" and one variable called
> "year", so that I have one observation for each
> company for each year, and each observation has
> several other variables.
>
> Basically what I want to do is to identify the 5
> companies that have the highest profits for each year.
> I then want to create a dummy variable (call it top5)
> which indicates, for each observation, whether that
> company for that year is one of the 5 most profitable.
> The 5 would be of course tend to be different for
> every year. I would end up with a variable which is 1
> if that company is among the 5 most profitable for
> that year, 0 otherwise. (I would then like to do this
> for top 10 and top 20 as well, but I guess I can
> figure that out once I have the above).
>
> The reason that I want to create such a variable is
> that I am doing panel data regressions and one of the
> independent variables that I want to throw in is a
> dummy like this, to indicate whether or not a given
> company is among the top 5 for that year. I also want
> to be able to exclude from my regression any company
> that is among the top 5 in a given year.
>
> A second, related question is how I can get the total
> profits of the top 5 banks in every year. I guess once
> I have created the dummy this shouldn't be too
> difficult - probably I can get some kind of table that
> sums up the profit variable for the top 5 (ie where
> the dummy=1) by year??
>

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```