[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
n j cox <n.j.cox@durham.ac.uk> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: HELP, please: sample splittings overlapped |

Date |
Tue, 23 Oct 2007 20:52:49 +0100 |

Statalist questions tend to get answered because they are clear and answerable, not in response to emotional pressure. It is better to presume that _everybody_ cares about their postings. That aside, > I am running a panel data model with Stata and my database is an > incomplete panel of 997 firms and some variables, max range: 1980 - > 2000 (but some firms have a lower range). > I need your help to make some calcutation on a variable, hereafter > called "mvalue" (it's a ratio). First of all, for each year, we can > find negative or positive or zero values of "mvalue". What I need: > * by considering the distribution of this variable, I would identify > as "LOW mvalue" those firms that are above the 25th percentile of the > distribution of the negative values. Note that this selection should > produce a dummy-column where - e.g. - the same firm could have zero > and one randomly distributed or only zero or one. Vice versa, HIGH > mvalue firms are above the 25th percentile of the distribution of > firms experiencing positive values of the values. This could mean several different things as far as I can see, depending on what distribution(s) you are talking about. Here is one: bysort year: egen lq_mvalue = /// pctile(mvalue) if mvalue < 0, p(25) bysort year (lq_mvalue): replace lq_mvalue = lq_mvalue[1] by year: egen uq_mvalue = /// pctile(mvalue) if mvalue > 0 & mvalue <., p(25) bysort year (uq_mvalue): replace uq_mvalue = uq_mvalue[1] gen byte high = mvalue > uq_mvalue if mvalue < . gen byte low = mvalue < lq_mvalue if mvalue < . > Finally, created these two dummies (two columns), for a firm to be > identified as a PERSISTENTLY LOW (HIGH) "mvalue", I require that it be > identified as a LOW (HIGH) "mvalue" firm for at least two consecutive > periods. This would produce two other columns, for PERSISTENTLY LOW > and PERSISTENTLY HIGH mvalue. This is a spell problem. Searching the archives for -tsspell- will yield relevant postings, some very recent. In this case, a direct approach is also possible. bysort panel (year) : gen pers_high = sum(high == 1 & high[_n-1] == 1) by panel: replace pers_high = pers_high[_N] >= 1 by panel : gen pers_low = sum(low == 1 & low[_n-1] == 1) by panel: replace pers_low = pers_low[_N] >= 1 If you consider say a run of 0s and 1s 0 0 0 1 1 1 0 0 0 .. we want to find whether there is at least one observation for which it is true that this dummy is 1 and the previous dummy is 1 Double counting is not a worry. Strictly, it is better to -tsset- and work in terms of -high- and -L.high-, etc. Nick n.j.cox@durham.ac.uk * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**st: Data Corruption?***From:*Ed Blackburne <blackburne@shsu.edu>

- Prev by Date:
**Re: Re: st: preserving missing values in collapse (sum)** - Next by Date:
**st: Difference-in-difference regression models** - Previous by thread:
**st: HELP, please: sample splittings overlapped** - Next by thread:
**st: Data Corruption?** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |