Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Nick Cox" <n.j.cox@durham.ac.uk> |
To | <statalist@hsphsun2.harvard.edu> |
Subject | st: RE: AW: correlate lag variables |
Date | Mon, 10 May 2010 11:38:30 +0100 |
The reason for differences is that -correlate- will only correlate variables for observations for which _all_ variables specified are non-missing. As Martin is implying, -pwcorr- is more indulgent, which is not necessarily a feature. The output for -correlate- made it clear that different numbers of observations were being used. At a guess, Julia's data are panel data, so every extra lag bites hard, meaning that for any increase in lag by 1, one more observation is necessarily lost at the end of each panel. So, the last observation in each panel cannot be used with lag one, the previous one with lag two, and so forth. Nick n.j.cox@durham.ac.uk Martin Weiss Try -pwcorr- instead: ************* clear* set obs 100 gen y=1 replace y =.6*y[_n-1]+rnormal() in 2/l gen byte time=_n tsset time corr y L.y L2.y pwcorr y L.y pwcorr y L.y L2.y ************* Julia I would like to calculate the correlation between a variable and its past values. Thus, I use the following command: . correlate BI L1.BI L2.BI (obs=225) | L. L2. | BI BI BI -------------+--------------------------- BI| --. | 1.0000 L1. | 0.0111 1.0000 L2. | 0.0647 0.0161 1.0000 However, if I only ask the correlation for the first lag, my result differs.... . correlate BI L1.BI (obs=265) | L. | BI BI -------------+------------------ BI| --. | 1.0000 L1. | 0.0174 1.0000 Why does excluding the second lag affect the correlation between the variable and its first lag? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/