From
Nick Cox <njcoxstata@gmail.com>

To
statalist@hsphsun2.harvard.edu

Subject
Re: st: RE: -scatter and alpha-blending

Date
Fri, 28 Sep 2012 12:02:47 +0100

The demonstration works better if gen y = cond(runiform() < 0.01, x, runiform()) is changed to something more like gen y = cond(runiform() < 0.05, x, runiform()) On Fri, Sep 28, 2012 at 11:47 AM, Nick Cox <njcoxstata@gmail.com> wrote: > Transparency would be great, but usually can only ease a problem, not > solve it. Once you have thousands of data points, there will be much > overlap or overplotting no matter what you do. Often it's best to have > the data display as a backdrop -- light grey|gray colours work well > here -- and concentrate on superimposing some smooth(s) as a way of > seeing structure. > > In addition to David's suggestions, a capricious small set of personal > recommendations would include > > 1. -twoway fpfit- If one were to judge by mentions on this list, > this is a rarely used command, but it's very flexible. > > 2. -rcspline- (SSC). This is a convenience command built on top of the > excellent -mkspline, cubic-, > > 3. The stuff discussed in this article (but if interested get the code > from gr0021_1 in SJ 10(1)) > > SJ-5-4 gr0021 . . . . . . . Speaking Stata: Smoothing in various directions > (help doublesm, diagsm, polarsm if installed) . . . . . . . N. J. Cox > Q4/05 SJ 5(4):574--593 > discusses exploratory tools for determining the structure > of bivariate data > > Stata also has -sunflower-. I am one of many people who have played > with this idea, but somehow I never want to show the results to > anybody else. > > More generally, everyone would be happier with 100 data points rather > than 10, and so forth, but larger is not necessarily easier to deal > with. The following kind of "needle in a haystack" demonstration can > be of use in teaching. It's not at all original. > > The data are just noise, except that 1% of them follow the y = x > diagonal exactly. > > set obs 100000 > gen x = runiform() > gen y = cond(runiform() < 0.01, x, runiform()) > > The first and very easy lesson is that the default marker symbol is useless. > > scatter y x > more > > So an easy thing is to change the symbol. > > scatter y x, ms(p) > more > > Sometimes the structure is easier to see with _fewer_ observations, so > you can try things like > > scatter y x if runiform() < 0.2, ms(p) > > Naturally, once you know what you are looking for, it is easier to find it. > > Depending on the audience and your inclinations you can raise the tone > by making the needle into a smiley face, an encouraging message or a > gratuitous insult aimed at a visiting sports team; or you can lower it > by mixing scientifically interesting patterns and noise and > encouraging discussion about how we find structure generally. > > Nick > > On Thu, Sep 27, 2012 at 11:41 PM, Francesco <cariboupad@gmx.fr> wrote: >> thank you very much! >> >> On 28 September 2012 00:35, David Radwin <dradwin@mprinc.com> wrote: >>> Unfortunately, Stata doesn't have transparent fills for scatterplot >>> markers, but you might be able to fashion a workaround similar to p. 605 >>> of: >>> >>> Stata tip 27: Classifying data points on scatter plots >>> N. J. Cox. 2005. >>> Stata Journal Volume 5 Number 4. >>> http://www.stata-journal.com/article.html?article=gr0023 >>> >>> The key would be to identify the observations that are completely >>> overlapping and make the markers darker in proportion to their relative >>> frequency. >>> >>> If there are no or few completely overlapping points, an approach that >>> might work is using hollow circles as markers (msymbol(Oh) or >>> msymbol(oh)), because circles minimize overplotting. >>> >>> David >>> -- >>> David Radwin >>> Senior Research Associate >>> MPR Associates, Inc. >>> 2150 Shattuck Ave., Suite 800 >>> Berkeley, CA 94704 >>> Phone: 510-849-4942 >>> Fax: 510-849-0794 >>> >>> www.mprinc.com >>> >>> >>>> -----Original Message----- >>>> From: owner-statalist@hsphsun2.harvard.edu [mailto:owner- >>>> statalist@hsphsun2.harvard.edu] On Behalf Of Francesco >>>> Sent: Thursday, September 27, 2012 3:13 PM >>>> To: statalist@hsphsun2.harvard.edu >>>> Subject: st: -scatter and alpha-blending >>>> >>>> Dear Statalist, >>>> >>>> I would like to know whether it is possible to obtain in Stata a >>>> scatterplot >>>> with so called "alpha blending" :the markers are slightly transparent so >>>> that darker regions in the graph have a higher point density... >>>> Very much like this (In R) : >>>> http://stackoverflow.com/questions/7714677/r-scatterplot-with-too-many- >>>> points >>>> * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

