Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
László Sándor <sandorl@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
st: question: how to collapse data fast for simplified, binned scatter plots |

Date |
Mon, 26 Mar 2012 18:12:13 -0400 |

Hi all, I have a relatively simple goal, but I am not sure which is the most efficient way to achieve it. Let me describe what it aims to be and how I currently do it under Stata 10.1 for Windows, and then please comment on whether it could be faster. Basically, I want to clarify scatter plots, as in vast datasets it is more informative to plot means (or some quantiles) of y against "bins" of x, where actually it is informative to use some quantiles to bin x (i.e. have even frequencies in the bins instead of, say, even raw distances between the bins). Basically, the graphs could like the second graph here: http://obs.rc.fas.harvard.edu/chetty/value_added.html Yes, it would be great if I could add a plot of linear fit later on, or perhaps plot multiple y variables against the same x, or a single y broken down by a categorical z, or two different quantiles of the same y. Also, for some applications I would want to plot only a residual after some linear fit (including an -areg- absorbing for some averages in some categories). I am not aware of anything built in for this. But once one has the bins of x, it is not that hard to collect the y against it. However, -collapse- is surprisingly slow in this regard (at least with millions or tens of millions of observations), and I had to use a workaround with tabulate and more. I am puzzled that this could be faster than -collapse-, but so it seems. Basically: if -collapse- is not the fastest tool for this (with the fast option), then what is? What does -twoway bar- use underneath, for example? What does -tabulate, summarize- use behind the scenes? Would you suggest an alternative route? Something more efficient? Something built-in? Some polished user-written tool? Thank you very much, Laszlo * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: question: how to collapse data fast for simplified, binned scatter plots***From:*Nick Cox <njcoxstata@gmail.com>

- Prev by Date:
**Re: st: Problem with Stata IC - max number of variables** - Next by Date:
**Re: st: Multinomial Logistic Regression with Panel Data** - Previous by thread:
**st: Problem with Stata IC - max number of variables** - Next by thread:
**Re: st: question: how to collapse data fast for simplified, binned scatter plots** - Index(es):