Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Zurab Sajaia <zsajaia@hotmail.com> |

To |
statalist <statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: Strange behaviour of -correlate- command |

Date |
Thu, 9 Dec 2010 19:50:53 -0500 |

You're absolutely right, and my manually calculated used mean of the prod i.e. dividing by n instead of (n-1), my bad, going home now :$. Thanks a lot, Zurab ---------------------------------------- > Subject: Re: st: Strange behaviour of -correlate- command > From: sandersn@stanford.edu > Date: Thu, 9 Dec 2010 16:30:54 -0800 > To: statalist@hsphsun2.harvard.edu > > If I recall correctly, Excel doesn't calculate the COVAR quite right. For some reason, it uses (1/n) rather than (1/n-1). That likely explains your odd results. > > -- > Nicholas J. Sanders, Ph.D. > Postdoctoral Fellow > Stanford Institute for Economic Policy Research > 366 Galvez St, Room 228 > Stanford, CA 94305 > > On Dec 9, 2010, at 4:23 PM, Zurab Sajaia wrote: > > > Dear all, > > > > I've encountered a problem for which I can't find an explanation so far, it seems that I'm getting wrong estimates of covariance, results differ if I use -correlate- command or do calculations manually (I tried exporting data to Excel and used COVAR() function there and it seems that Excel is on my side), > > so I was wandering whether something is indeed wrong in Stata, or I'm doing it incorrectly (perhaps it's time to stop working and go home?)... > > > > So here the deal, I've uploaded an example dataset to the web (30kb): > > > > .use http://www.adeptanalytics.org/download/temp/corr_bug.dta, clear > > > > .corr y r, c > > (obs=2419) > > | y r > > -------------+------------------ > > y | 2.8e+07 > > r | 1142.05 .083368 > > > > > > > > but if I do it manually: > > > > .summarize y, meanonly > > .generate double y1 = y - r(mean) > > > > .summarize r, meanonly > > generate double r1 = r - r(mean) > > > > generate double prod = y1 * r1 > > > > summarize prod > > Variable | Obs Mean Std. Dev. Min Max > > -------------+-------------------------------------------------------- > > prod | 2419 1141.579 2152.761 -53.76514 47015.59 > > > > > > The same result (1141.579) I get using Excel's COVAR() function. > > Do you have any ideas what can be happening here? > > > > Thanks, > > Zurab > > > > * > > * For searches and help try: > > * http://www.stata.com/help.cgi?search > > * http://www.stata.com/support/statalist/faq > > * http://www.ats.ucla.edu/stat/stata/ > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Strange behaviour of -correlate- command***From:*Zurab Sajaia <zsajaia@hotmail.com>

**Re: st: Strange behaviour of -correlate- command***From:*Nick Sanders <sandersn@stanford.edu>

- Prev by Date:
**Re: st: Strange behaviour of -correlate- command** - Next by Date:
**st: How to caculate the 90th percentile and 50th percentile of the wage distribution in my data set using stata?** - Previous by thread:
**Re: st: Strange behaviour of -correlate- command** - Next by thread:
**st: How to caculate the 90th percentile and 50th percentile of the wage distribution in my data set using stata?** - Index(es):