Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: visualization?

From	Nick Cox <[email protected]>
To	[email protected]
Subject	Re: st: visualization?
Date	Wed, 5 Oct 2011 20:15:09 +0100

Thanks for this. I think the best report on (what I have heard on)
transparency is that some users, just like you, want to see it in
Stata.

Note that 1 to 6 are from the posting Stas mentioned.

I would add

9. Where relevant, add a linear reference pattern. Example: add a line
of equality y = x when that is natural.

10. Even better, when relevant, change axes so that there is a
horizontal linear reference pattern. Examples: residual vs fitted,
difference vs mean or sum.

On Wed, Oct 5, 2011 at 7:58 PM, Gabi Huiber <[email protected]> wrote:
> Nick, thank you for this list. It's a useful refresher.
>
> Regarding 2 and 6: I didn't know that transparency was on a wish list,
> but I'm glad to hear it is. I once saw a nice demonstration of ggplot2
> on r-blogger.com: markers of slightly less than 100% transparency
> acted like disks of glass. One of them looks barely visible; the more
> of them you stack, the darker the pile. This gives a very nice
> gradient over a scattershot. It's prettier than the current
> recommended workaround that we use hollow circles.
>
> Gabi
>
> On Fri, Sep 30, 2011 at 10:23 AM, Nick Cox <[email protected]> wrote:
>> Do you mean Vince Viggins? Sounds like a Dickens character. We saw
>> Vince Wiggins at the London meeting.
>>
>> We are starting with these suggestions. I'll add numbers for convenience.
>>
>> 1. If one of the variables is positively skewed, consider plotting
>> that axis on a log scale.
>>
>> 2. If there are a lot of data points (e.g., n > 1000), adopt a
>> different strategy such as using some form of partial transparency, or
>> sampling the data;
>>
>> 3. If one of the variables takes on a limited number of discrete
>> categories, consider using a jitter or a sunflower plot;
>>
>> 4. If there are three or more variables, consider using a scatterplot matrix;
>>
>> 5. Fitting some form of trend line is often useful;
>>
>> 6. Adjust the size of the plotting character to the sample size (for
>> bigger n, use a smaller plotting character);
>>
>> Random comments
>>
>> 1. I take this as standard. I'll add a plea for consideration of any
>> reasonable non-linear scale, labelled in the original units!
>>
>> 2 and 6. Transparency is on some wishlists for Stata. With lots of
>> data, you go not only for smaller symbols but more open ones and use
>> lighter colors.
>>
>> 3. I've played with sunflower plots and gone off them. But if you want
>> to try them, note that they are undocumented [sic] at -help twoway
>> sunflower-. For highly discrete or even categorical variables, I like
>> my -tabplot- (SSC).
>>
>> 4. Agree, although that does not rule some projection from a
>> multivariate analysis being helpful too.
>>
>> 5. Yes, if "trend" means "smooth". Some special smooths were published in
>>
>> SJ-10-1 gr0021_1  . . . . . . . . . .  Software update for doublesm and diagsm
>>        (help doublesm, diagsm, polarsm if installed) . . . . . . .  N. J. Cox
>>        Q1/10   SJ 10(1):164
>>        option to carry out smoothing using restricted cubic splines
>>        added to doublesm and diagsm
>>
>> SJ-5-4  gr0021  . . . . . . .  Speaking Stata: Smoothing in various directions
>>        (help doublesm, diagsm, polarsm if installed) . . . . . . .  N. J. Cox
>>        Q4/05   SJ 5(4):574--593
>>        discusses exploratory tools for determining the structure
>>        of bivariate data
>>
>> Some possible additions:
>>
>> 7. About 1980, there was a sudden fashion for adding convex hulls,
>> which faded away quickly. I remember often doing it with a pencil on
>> lineprinter output. But Allan Reese has a nice implementation on SSC
>> as -cvxhull-. On occasion that helps a lot.
>>
>> 8. When you have a categorical subdivision, try out both several
>> categories superimposed and a -by()- option to give separate plots. A
>> third strategy is given in
>>
>> SJ-10-4 gr0046  . . . . . . . . . . . . . . . Speaking Stata: Graphing subsets
>>        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
>>        Q4/10   SJ 10(4):670--681                                (no commands)
>>        explores graphical comparison of results for two or more
>>        subsets where each subset is plotted in a separate panel,
>>        with the rest of the data as a backdrop
>>
>> Nick
>>
>> On Fri, Sep 30, 2011 at 2:56 PM, Stas Kolenikov <[email protected]> wrote:
>>
>>> There was an interesting question on data visualization on
>>> Stats.StackExchange (http://stats.stackexchange.com/q/13148/5739):
>>> what are the efficient strategies for tweaking scatterplots depending
>>> on the data needs? Too much data make it clogged, too little data such
>>> as ordinal make it too chunky, too skewed data makes it sit in one
>>> corner, and there are a multitude of other things that needs to be
>>> adjusted to make the display really informative.
>>>
>>> I would be especially curious to hear from Nick Cox and Michael
>>> Mitchell, I guess, as the greatest contributors to Stata graphics (and
>>> of course Vince V, but I don't think I've seen him on the list for a
>>> while).

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- Re: st: visualization?
  - From: Gabi Huiber <[email protected]>

Prev by Date: st: RE: RE: rmanova or anova with repeated command, what to use?
Next by Date: st: Speed of bsample and nested loops
Previous by thread: Re: st: visualization?
Next by thread: st: Speed of bsample and nested loops
Index(es):
- Date
- Thread