Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: difficulty recoding variable by referring to prior and subsequent lines in panel data


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: difficulty recoding variable by referring to prior and subsequent lines in panel data
Date   Fri, 8 Mar 2013 11:21:01 +0000

Working backwards:

1. References using subscripts such as [_n-1] and [_n+1] are
completely independent of whether -tsset- has been used previously.

If it were otherwise, then users -- at any level -- would need to keep
track of whether the data were currently -tsset- (something that could
have happened in a previous session with Stata) in using subscript
references. Similarly, anyone reading a code segment using subscript
references could not interpret those references correctly without
knowing about a prior -tsset-. Finally, programmers would have to take
account of whether a -tsset- was in force in using subscripts, as the
effects might be different.

The more you think about it, the more it is clear that this would be a
bad idea, but it is not how Stata works, which is good news.

2. The problem stated could be tackled with subscripts so long as -by
<panel identifier> (<time identifier>):- were the framework, say

bysort panelid (time) :

3. But it is easiest just to exploit the scope for using time series
operators that -tsset- implies. See below.

4. The effects of subscript references and time series operators will
differ in time series with gaps.

I haven't tried to follow this code, which seems rather tangled, but
trust that this self-contained example will prove instructive

. clear

. set obs 10
obs was 0, now 10

. egen id = seq(), block(5)

. egen t = seq(), to(5)

. gen y = 1 + mod(_n, 2)

. l

     +------------+
     | id   t   y |
     |------------|
  1. |  1   1   2 |
  2. |  1   2   1 |
  3. |  1   3   2 |
  4. |  1   4   1 |
  5. |  1   5   2 |
     |------------|
  6. |  2   1   1 |
  7. |  2   2   2 |
  8. |  2   3   1 |
  9. |  2   4   2 |
 10. |  2   5   1 |
     +------------+

. tsset id t
       panel variable:  id (strongly balanced)
        time variable:  t, 1 to 5
                delta:  1 unit

. gen y2 = y

. replace y2 = L.y if (L.y == F.y) & (y != F.y) & !missing(L.y, F.y)
(6 real changes made)

. l

     +-----------------+
     | id   t   y   y2 |
     |-----------------|
  1. |  1   1   2    2 |
  2. |  1   2   1    2 |
  3. |  1   3   2    1 |
  4. |  1   4   1    2 |
  5. |  1   5   2    2 |
     |-----------------|
  6. |  2   1   1    1 |
  7. |  2   2   2    1 |
  8. |  2   3   1    2 |
  9. |  2   4   2    1 |
 10. |  2   5   1    1 |
     +-----------------+

Here the principles suggested are

1. In general, always smooth a copy of the variable concerned.

2. Alison wants to test

if previous and following values are the same: use L. and F. operators

but different from the present value (this condition appears
redundant, but does no harm)

but to avoid beginning and ends of each panel (not using observations
for which either or both previous and following values are missing
appears sufficient to avoid this).

3. My concocted example also points up an instability in the method of
"smoothing", perhaps better seen from

clear
set obs 10
egen id = seq(), block(5)
egen t = seq(), to(5)
gen y = 1 + mod(_n, 2)
l
tsset id t
gen y2 = y
replace y2 = L.y if (L.y == F.y) & (y != F.y) & !missing(L.y, F.y)
l

On the other hand it may work well for smoothing isolated anomalies.

Nick

On Fri, Mar 8, 2013 at 3:27 AM, Alison El Ayadi <alisonelayadi@yahoo.com> wrote:

> I am having an issue where I am working with a binary variable in panel data and want to recode the variable at a particular time point to the opposite condition if the value at the prior and subsequent time points are equal to each other but not equal to the reference time point.
>
> I initially used this code:
>   *tsset data
> sort pt_id index_cl index_rh
>          by pt_id: replace n_num = _n
>          sort pt_id n_num
>          tsset pt_id n_num
>
>    *take care of 'random' differences
>     gen low_index_recode_ind = 1 if (low_index!=low_index[_n+1]) & (low_index!=low_index[_n-1]) & (low_index[_n-1]==low_index[_ n+1]) & low_index!=. & low_index[_n-1]!=.
> gen low_index_new = low_index
>          replace low_index_new = 1 if low_index_recode_ind==1 & low_index==0
>          \replace low_index_new = 0 if low_index_recode_ind==1 & low_index==1
>
>
>
> And I found that when recoding a number of first and last lines were recoded.  So I made a change to include conditions in bold:
>  gen low_index_recode_ind = 1 if (low_index!=low_index[_n+1]) & (low_index!=low_index[_n-1]) & (low_index[_n-1]==low_index[_n+1]) & low_index!=. & low_index[_n-1]!=. & n_num!=1 & n_num!=_N
> However I am still finding that there are at least several final lines that have been recoded.
>
> I understand by tsset-ing the data Stata would not be comparing the lines belonging to one pt_id to lines from the next pt_id in dataset sequence, am I correct in this assumption, and if so, does anyone see any mistakes in my code which will allow this change to operate successfully as intended for all lines, not just middle lines?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index