# st: Re: panel data management - dividing into quartiles

 From Crystal Lopez To statalist@hsphsun2.harvard.edu Subject st: Re: panel data management - dividing into quartiles Date Sun, 15 May 2005 21:25:38 -0700 (PDT)

```Hello,

Thank you for the replies to my query. I'm sorry to
bother people but I unfortunately still don't seem to
have a solution though. The code that Kit forwarded in
his last message "panel data mgmt: dividing into
quartiles for each year" creates a separate dummy
variable for each year - quartile1935, quartile1936,
etc, which take on values between 1 and 4 for
observations falling within that year, and are of
course missing otherwise. The problem with this, is
that what I actually need is a single variable, call
in quartile, that will take a value between 1 and 4,
depending on whether that observation (i.e. that bank
in that year) falls into the 1st, 2nd, 3rd, or 4th
quartiles for that year, based on the assets in that
year. That way, I would be able to run my regressions
separately for the different groups, as in
xtreg ........ if quartile==1, fe
and so forth.

The problem with the other code that Kit pasted (thank
you Kit :)) in the message "dividing into quartiles",
which was
egen quartile = cut(assets), group(4) label
replace quartile = quartile+1

is that it seems to pool all the data and divide it
into quartiles, for the dataset as a whole rather than
by year. And then the variable quartile takes on a
value which is the maximum value of assets in that
quartile, the cutoff point I guess. So this is not
what I need either, although this is at least one
variable.

I did try Dev's suggestion as well, although I am not
sure whether the calculation of effiency scores is
really dealing with the same thing. But the levels
command did not work ("unrecognized command:levels"),
perhaps I misunderstood this and was supposed to

So I am still optimistically hoping for a solution! To
reiterate, what I want to do is to create a variable
"quartile", that has the value 1 if that observation
is in the top quartile of assets for that year (i.e.
if that bank is among the top quartile of banks in
terms of assets for that year), has the value 2 if
that observation is in the second quartile of assets
for that year, etc.

Incidentally, Nick Cox had previously suggested the
following code (which worked perfectly) to assign a
dummy variable according to whether a company's
profits  were in the top 5 in a given year or not. I'm
pasting it below as I thought it might be possible to
do something similar here - but in this case I am not
looking for whether a company is in the top 5 or not
according to a particular variable, but rather which
quartile it fits into according to a particular
variable. But I though some similar logic might
apply...?

"bysort year (profit) : gen high = (_N - _n) <= 4

If -profit- is ever missing, you need

gen OK = profit < .
bysort OK year (profit) : gen high = OK * ((_N - _n)
<= 4)

Within each block of -year-, -sort- on -profit-.

The company with the highest -profit- is then last,
such that _N == _n and _N - _n == 0.

The second highest profit is then such that _N - 1 =
_n
and _N - _n == 1.

And so third. And so forth.

This all hinges on the fact that under -by:- _n and _N

are defined within groups."

Thank you in advance for any help!!

Crystal.

__________________________________
Do you Yahoo!?