Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <njcoxstata@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: difficulty recoding variable by referring to prior and subsequent lines in panel data |

Date |
Fri, 8 Mar 2013 11:21:01 +0000 |

Working backwards: 1. References using subscripts such as [_n-1] and [_n+1] are completely independent of whether -tsset- has been used previously. If it were otherwise, then users -- at any level -- would need to keep track of whether the data were currently -tsset- (something that could have happened in a previous session with Stata) in using subscript references. Similarly, anyone reading a code segment using subscript references could not interpret those references correctly without knowing about a prior -tsset-. Finally, programmers would have to take account of whether a -tsset- was in force in using subscripts, as the effects might be different. The more you think about it, the more it is clear that this would be a bad idea, but it is not how Stata works, which is good news. 2. The problem stated could be tackled with subscripts so long as -by <panel identifier> (<time identifier>):- were the framework, say bysort panelid (time) : 3. But it is easiest just to exploit the scope for using time series operators that -tsset- implies. See below. 4. The effects of subscript references and time series operators will differ in time series with gaps. I haven't tried to follow this code, which seems rather tangled, but trust that this self-contained example will prove instructive . clear . set obs 10 obs was 0, now 10 . egen id = seq(), block(5) . egen t = seq(), to(5) . gen y = 1 + mod(_n, 2) . l +------------+ | id t y | |------------| 1. | 1 1 2 | 2. | 1 2 1 | 3. | 1 3 2 | 4. | 1 4 1 | 5. | 1 5 2 | |------------| 6. | 2 1 1 | 7. | 2 2 2 | 8. | 2 3 1 | 9. | 2 4 2 | 10. | 2 5 1 | +------------+ . tsset id t panel variable: id (strongly balanced) time variable: t, 1 to 5 delta: 1 unit . gen y2 = y . replace y2 = L.y if (L.y == F.y) & (y != F.y) & !missing(L.y, F.y) (6 real changes made) . l +-----------------+ | id t y y2 | |-----------------| 1. | 1 1 2 2 | 2. | 1 2 1 2 | 3. | 1 3 2 1 | 4. | 1 4 1 2 | 5. | 1 5 2 2 | |-----------------| 6. | 2 1 1 1 | 7. | 2 2 2 1 | 8. | 2 3 1 2 | 9. | 2 4 2 1 | 10. | 2 5 1 1 | +-----------------+ Here the principles suggested are 1. In general, always smooth a copy of the variable concerned. 2. Alison wants to test if previous and following values are the same: use L. and F. operators but different from the present value (this condition appears redundant, but does no harm) but to avoid beginning and ends of each panel (not using observations for which either or both previous and following values are missing appears sufficient to avoid this). 3. My concocted example also points up an instability in the method of "smoothing", perhaps better seen from clear set obs 10 egen id = seq(), block(5) egen t = seq(), to(5) gen y = 1 + mod(_n, 2) l tsset id t gen y2 = y replace y2 = L.y if (L.y == F.y) & (y != F.y) & !missing(L.y, F.y) l On the other hand it may work well for smoothing isolated anomalies. Nick On Fri, Mar 8, 2013 at 3:27 AM, Alison El Ayadi <alisonelayadi@yahoo.com> wrote: > I am having an issue where I am working with a binary variable in panel data and want to recode the variable at a particular time point to the opposite condition if the value at the prior and subsequent time points are equal to each other but not equal to the reference time point. > > I initially used this code: > *tsset data > sort pt_id index_cl index_rh > by pt_id: replace n_num = _n > sort pt_id n_num > tsset pt_id n_num > > *take care of 'random' differences > gen low_index_recode_ind = 1 if (low_index!=low_index[_n+1]) & (low_index!=low_index[_n-1]) & (low_index[_n-1]==low_index[_ n+1]) & low_index!=. & low_index[_n-1]!=. > gen low_index_new = low_index > replace low_index_new = 1 if low_index_recode_ind==1 & low_index==0 > \replace low_index_new = 0 if low_index_recode_ind==1 & low_index==1 > > > > And I found that when recoding a number of first and last lines were recoded. So I made a change to include conditions in bold: > gen low_index_recode_ind = 1 if (low_index!=low_index[_n+1]) & (low_index!=low_index[_n-1]) & (low_index[_n-1]==low_index[_n+1]) & low_index!=. & low_index[_n-1]!=. & n_num!=1 & n_num!=_N > However I am still finding that there are at least several final lines that have been recoded. > > I understand by tsset-ing the data Stata would not be comparing the lines belonging to one pt_id to lines from the next pt_id in dataset sequence, am I correct in this assumption, and if so, does anyone see any mistakes in my code which will allow this change to operate successfully as intended for all lines, not just middle lines? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: difficulty recoding variable by referring to prior and subsequent lines in panel data***From:*Nick Cox <njcoxstata@gmail.com>

**References**:**st: difficulty recoding variable by referring to prior and subsequent lines in panel data***From:*Alison El Ayadi <alisonelayadi@yahoo.com>

- Prev by Date:
**st: Variables in reg2hdfe** - Next by Date:
**Re: st: Variables in reg2hdfe** - Previous by thread:
**st: difficulty recoding variable by referring to prior and subsequent lines in panel data** - Next by thread:
**Re: st: difficulty recoding variable by referring to prior and subsequent lines in panel data** - Index(es):