Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.

# st: RE: AW: correlate lag variables

 From "Nick Cox" To Subject st: RE: AW: correlate lag variables Date Mon, 10 May 2010 11:38:30 +0100

```The reason for differences is that -correlate- will only correlate variables for observations for which _all_ variables specified are non-missing. As Martin is implying, -pwcorr- is more indulgent, which is not necessarily a feature.

The output for -correlate- made it clear that different numbers of observations were being used.

At a guess, Julia's data are panel data, so every extra lag bites hard, meaning that for any increase in lag by 1, one more observation is necessarily lost at the end of each panel. So, the last observation in each panel cannot be used with lag one, the previous one with lag two, and so forth.

Nick
n.j.cox@durham.ac.uk

Martin Weiss

*************
clear*
set obs 100
gen y=1
replace y =.6*y[_n-1]+rnormal() in 2/l
gen byte time=_n
tsset time
corr y L.y L2.y
pwcorr y L.y
pwcorr y L.y L2.y
*************

Julia

I would like to calculate the correlation between a variable and its
past values. Thus, I use the following command:

. correlate BI L1.BI L2.BI
(obs=225)

|           L.      L2.
| BI       BI      BI
-------------+---------------------------
BI|
--. |   1.0000
L1. |   0.0111   1.0000
L2. |   0.0647   0.0161   1.0000

However, if I only ask the correlation for the first lag, my result
differs....

. correlate BI L1.BI
(obs=265)

|             L.
|    BI     BI
-------------+------------------
BI|
--. |   1.0000
L1. |   0.0174   1.0000

Why does excluding the second lag affect the correlation between the
variable and its first lag?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```