# Re: st: HELP, please: sample splittings overlapped

 From n j cox To statalist@hsphsun2.harvard.edu Subject Re: st: HELP, please: sample splittings overlapped Date Tue, 23 Oct 2007 20:52:49 +0100

```Statalist questions tend to get answered because
they are clear and answerable, not in response
to emotional pressure. It is better to presume that

That aside,

> I am running a panel data model with Stata and my database is an
> incomplete panel of 997 firms and some variables, max range: 1980 -
> 2000 (but some firms have a lower range).

> I need your help to make some calcutation on a variable, hereafter
> called "mvalue" (it's a ratio). First of all, for each year, we can
> find negative or positive or zero values of "mvalue". What I need:

> * by considering the distribution of this variable, I would identify
> as "LOW mvalue" those firms that are above the 25th percentile of the
> distribution of the negative values. Note that this selection should
> produce a dummy-column where - e.g. - the same firm could have zero
> and one randomly distributed or only zero or one. Vice versa, HIGH
> mvalue firms are above the 25th percentile of the distribution of
> firms experiencing positive values of the values.

This could mean several different things as far as I can see,
depending on what distribution(s) you are talking about. Here
is one:

bysort year: egen lq_mvalue = ///
pctile(mvalue) if mvalue < 0, p(25)
bysort year (lq_mvalue): replace lq_mvalue = lq_mvalue[1]

by year: egen uq_mvalue = ///
pctile(mvalue) if mvalue > 0 & mvalue <., p(25)
bysort year (uq_mvalue): replace uq_mvalue = uq_mvalue[1]

gen byte high = mvalue > uq_mvalue if mvalue < .

gen byte low = mvalue < lq_mvalue if mvalue < .

> Finally, created these two dummies (two columns), for a firm to be
> identified as a PERSISTENTLY LOW (HIGH) "mvalue", I require that it be
> identified as a LOW (HIGH) "mvalue" firm for at least two consecutive
> periods. This would produce two other columns, for PERSISTENTLY LOW
> and PERSISTENTLY HIGH mvalue.

This is a spell problem. Searching the archives for -tsspell- will
yield relevant postings, some very recent.

In this case, a direct approach is also possible.

bysort panel (year) : gen pers_high = sum(high == 1 & high[_n-1] == 1)
by panel: replace pers_high = pers_high[_N] >= 1

by panel : gen pers_low = sum(low == 1 & low[_n-1] == 1)
by panel: replace pers_low = pers_low[_N] >= 1

If you consider say a run of 0s and 1s

0 0 0 1 1 1 0 0 0  ..

we want to find whether there is at least one observation
for which it is true that

this dummy is 1 and the previous dummy is 1

Double counting is not a worry. Strictly, it is better
to -tsset- and work in terms of -high- and -L.high-, etc.

Nick
n.j.cox@durham.ac.uk
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```