Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: -scatter and alpha-blending

From	Nick Cox <[email protected]>
To	[email protected]
Subject	Re: st: RE: -scatter and alpha-blending
Date	Fri, 28 Sep 2012 12:02:47 +0100

The demonstration works better if

 gen y = cond(runiform() < 0.01, x, runiform())

is changed to something more like

 gen y = cond(runiform() < 0.05, x, runiform())

On Fri, Sep 28, 2012 at 11:47 AM, Nick Cox <[email protected]> wrote:
> Transparency would be great, but usually can only ease a problem, not
> solve it. Once you have thousands of data points, there will be much
> overlap or overplotting no matter what you do. Often it's best to have
> the data display as a backdrop -- light grey|gray colours work well
> here -- and concentrate on superimposing some smooth(s) as a way of
> seeing structure.
>
> In addition to David's suggestions, a capricious small set of personal
> recommendations would include
>
> 1. -twoway fpfit-    If one were to judge by mentions on this list,
> this is a rarely used command, but it's very flexible.
>
> 2. -rcspline- (SSC). This is a convenience command built on top of the
> excellent -mkspline, cubic-,
>
> 3. The stuff discussed in this article (but if interested get the code
> from gr0021_1 in SJ 10(1))
>
> SJ-5-4  gr0021  . . . . . . .  Speaking Stata: Smoothing in various directions
>         (help doublesm, diagsm, polarsm if installed) . . . . . . .  N. J. Cox
>         Q4/05   SJ 5(4):574--593
>         discusses exploratory tools for determining the structure
>         of bivariate data
>
> Stata also has -sunflower-. I am one of many people who have played
> with this idea, but somehow I never want to show the results to
> anybody else.
>
> More generally, everyone would be happier with 100 data points rather
> than 10, and so forth, but larger is not necessarily easier to deal
> with. The following kind of "needle in a haystack" demonstration can
> be of use in teaching. It's not at all original.
>
> The data are just noise, except that 1% of them follow the y = x
> diagonal exactly.
>
> set obs 100000
> gen x = runiform()
> gen y = cond(runiform() < 0.01, x, runiform())
>
> The first and very easy lesson is that the default marker symbol is useless.
>
> scatter y x
> more
>
> So an easy thing is to change the symbol.
>
> scatter y x, ms(p)
> more
>
> Sometimes the structure is easier to see with _fewer_ observations, so
> you can try things like
>
> scatter y x if runiform() < 0.2, ms(p)
>
> Naturally, once you know what you are looking for, it is easier to find it.
>
> Depending on the audience and your inclinations you can raise the tone
> by making the needle into a smiley face, an encouraging message or a
> gratuitous insult aimed at a visiting sports team; or you can lower it
> by mixing scientifically interesting patterns and noise and
> encouraging discussion about how we find structure generally.
>
> Nick
>
> On Thu, Sep 27, 2012 at 11:41 PM, Francesco <[email protected]> wrote:
>> thank you very much!
>>
>> On 28 September 2012 00:35, David Radwin <[email protected]> wrote:
>>> Unfortunately, Stata doesn't have transparent fills for scatterplot
>>> markers, but you might be able to fashion a workaround similar to p. 605
>>> of:
>>>
>>> Stata tip 27: Classifying data points on scatter plots
>>>     N. J. Cox. 2005.
>>>     Stata Journal Volume 5 Number 4.
>>> http://www.stata-journal.com/article.html?article=gr0023
>>>
>>> The key would be to identify the observations that are completely
>>> overlapping and make the markers darker in proportion to their relative
>>> frequency.
>>>
>>> If there are no or few completely overlapping points, an approach that
>>> might work is using hollow circles as markers (msymbol(Oh) or
>>> msymbol(oh)), because circles minimize overplotting.
>>>
>>> David
>>> --
>>> David Radwin
>>> Senior Research Associate
>>> MPR Associates, Inc.
>>> 2150 Shattuck Ave., Suite 800
>>> Berkeley, CA 94704
>>> Phone: 510-849-4942
>>> Fax: 510-849-0794
>>>
>>> www.mprinc.com
>>>
>>>
>>>> -----Original Message-----
>>>> From: [email protected] [mailto:owner-
>>>> [email protected]] On Behalf Of Francesco
>>>> Sent: Thursday, September 27, 2012 3:13 PM
>>>> To: [email protected]
>>>> Subject: st: -scatter and alpha-blending
>>>>
>>>> Dear Statalist,
>>>>
>>>> I would like to know whether it is possible to obtain in Stata a
>>>> scatterplot
>>>> with so called "alpha blending" :the markers are slightly transparent so
>>>> that darker regions in the graph have a higher point density...
>>>> Very much like this (In R) :
>>>> http://stackoverflow.com/questions/7714677/r-scatterplot-with-too-many-
>>>> points
>>>>
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: -scatter and alpha-blending
  - From: Francesco <[email protected]>
- st: RE: -scatter and alpha-blending
  - From: "David Radwin" <[email protected]>
- Re: st: RE: -scatter and alpha-blending
  - From: Francesco <[email protected]>
- Re: st: RE: -scatter and alpha-blending
  - From: Nick Cox <[email protected]>

Prev by Date: Re: st: RE: -scatter and alpha-blending
Next by Date: RE: st: Problem with IV regression and two-way clustering
Previous by thread: Re: st: RE: -scatter and alpha-blending
Next by thread: Re: st: -scatter and alpha-blending
Index(es):
- Date
- Thread