Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: -scatter and alpha-blending

From   Nick Cox <>
Subject   Re: st: RE: -scatter and alpha-blending
Date   Fri, 28 Sep 2012 11:47:54 +0100

Transparency would be great, but usually can only ease a problem, not
solve it. Once you have thousands of data points, there will be much
overlap or overplotting no matter what you do. Often it's best to have
the data display as a backdrop -- light grey|gray colours work well
here -- and concentrate on superimposing some smooth(s) as a way of
seeing structure.

In addition to David's suggestions, a capricious small set of personal
recommendations would include

1. -twoway fpfit-    If one were to judge by mentions on this list,
this is a rarely used command, but it's very flexible.

2. -rcspline- (SSC). This is a convenience command built on top of the
excellent -mkspline, cubic-,

3. The stuff discussed in this article (but if interested get the code
from gr0021_1 in SJ 10(1))

SJ-5-4  gr0021  . . . . . . .  Speaking Stata: Smoothing in various directions
        (help doublesm, diagsm, polarsm if installed) . . . . . . .  N. J. Cox
        Q4/05   SJ 5(4):574--593
        discusses exploratory tools for determining the structure
        of bivariate data

Stata also has -sunflower-. I am one of many people who have played
with this idea, but somehow I never want to show the results to
anybody else.

More generally, everyone would be happier with 100 data points rather
than 10, and so forth, but larger is not necessarily easier to deal
with. The following kind of "needle in a haystack" demonstration can
be of use in teaching. It's not at all original.

The data are just noise, except that 1% of them follow the y = x
diagonal exactly.

set obs 100000
gen x = runiform()
gen y = cond(runiform() < 0.01, x, runiform())

The first and very easy lesson is that the default marker symbol is useless.

scatter y x

So an easy thing is to change the symbol.

scatter y x, ms(p)

Sometimes the structure is easier to see with _fewer_ observations, so
you can try things like

scatter y x if runiform() < 0.2, ms(p)

Naturally, once you know what you are looking for, it is easier to find it.

Depending on the audience and your inclinations you can raise the tone
by making the needle into a smiley face, an encouraging message or a
gratuitous insult aimed at a visiting sports team; or you can lower it
by mixing scientifically interesting patterns and noise and
encouraging discussion about how we find structure generally.


On Thu, Sep 27, 2012 at 11:41 PM, Francesco <> wrote:
> thank you very much!
> On 28 September 2012 00:35, David Radwin <> wrote:
>> Unfortunately, Stata doesn't have transparent fills for scatterplot
>> markers, but you might be able to fashion a workaround similar to p. 605
>> of:
>> Stata tip 27: Classifying data points on scatter plots
>>     N. J. Cox. 2005.
>>     Stata Journal Volume 5 Number 4.
>> The key would be to identify the observations that are completely
>> overlapping and make the markers darker in proportion to their relative
>> frequency.
>> If there are no or few completely overlapping points, an approach that
>> might work is using hollow circles as markers (msymbol(Oh) or
>> msymbol(oh)), because circles minimize overplotting.
>> David
>> --
>> David Radwin
>> Senior Research Associate
>> MPR Associates, Inc.
>> 2150 Shattuck Ave., Suite 800
>> Berkeley, CA 94704
>> Phone: 510-849-4942
>> Fax: 510-849-0794
>>> -----Original Message-----
>>> From: [mailto:owner-
>>>] On Behalf Of Francesco
>>> Sent: Thursday, September 27, 2012 3:13 PM
>>> To:
>>> Subject: st: -scatter and alpha-blending
>>> Dear Statalist,
>>> I would like to know whether it is possible to obtain in Stata a
>>> scatterplot
>>> with so called "alpha blending" :the markers are slightly transparent so
>>> that darker regions in the graph have a higher point density...
>>> Very much like this (In R) :
>>> points
*   For searches and help try:

© Copyright 1996–2016 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index