[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: HELP, please: sample splittings overlapped
Statalist questions tend to get answered because
they are clear and answerable, not in response
to emotional pressure. It is better to presume that
_everybody_ cares about their postings.
That aside,
> I am running a panel data model with Stata and my database is an
> incomplete panel of 997 firms and some variables, max range: 1980 -
> 2000 (but some firms have a lower range).
> I need your help to make some calcutation on a variable, hereafter
> called "mvalue" (it's a ratio). First of all, for each year, we can
> find negative or positive or zero values of "mvalue". What I need:
> * by considering the distribution of this variable, I would identify
> as "LOW mvalue" those firms that are above the 25th percentile of the
> distribution of the negative values. Note that this selection should
> produce a dummy-column where - e.g. - the same firm could have zero
> and one randomly distributed or only zero or one. Vice versa, HIGH
> mvalue firms are above the 25th percentile of the distribution of
> firms experiencing positive values of the values.
This could mean several different things as far as I can see,
depending on what distribution(s) you are talking about. Here
is one:
bysort year: egen lq_mvalue = ///
pctile(mvalue) if mvalue < 0, p(25)
bysort year (lq_mvalue): replace lq_mvalue = lq_mvalue[1]
by year: egen uq_mvalue = ///
pctile(mvalue) if mvalue > 0 & mvalue <., p(25)
bysort year (uq_mvalue): replace uq_mvalue = uq_mvalue[1]
gen byte high = mvalue > uq_mvalue if mvalue < .
gen byte low = mvalue < lq_mvalue if mvalue < .
> Finally, created these two dummies (two columns), for a firm to be
> identified as a PERSISTENTLY LOW (HIGH) "mvalue", I require that it be
> identified as a LOW (HIGH) "mvalue" firm for at least two consecutive
> periods. This would produce two other columns, for PERSISTENTLY LOW
This is a spell problem. Searching the archives for -tsspell- will
yield relevant postings, some very recent.
In this case, a direct approach is also possible.
bysort panel (year) : gen pers_high = sum(high == 1 & high[_n-1] == 1)
by panel: replace pers_high = pers_high[_N] >= 1
by panel : gen pers_low = sum(low == 1 & low[_n-1] == 1)
by panel: replace pers_low = pers_low[_N] >= 1
If you consider say a run of 0s and 1s
0 0 0 1 1 1 0 0 0 ..
we want to find whether there is at least one observation
for which it is true that
this dummy is 1 and the previous dummy is 1
Double counting is not a worry. Strictly, it is better
to -tsset- and work in terms of -high- and -L.high-, etc.
[email protected]
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/