Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <njcoxstata@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: visualization? |

Date |
Wed, 5 Oct 2011 20:15:09 +0100 |

Thanks for this. I think the best report on (what I have heard on) transparency is that some users, just like you, want to see it in Stata. Note that 1 to 6 are from the posting Stas mentioned. I would add 9. Where relevant, add a linear reference pattern. Example: add a line of equality y = x when that is natural. 10. Even better, when relevant, change axes so that there is a horizontal linear reference pattern. Examples: residual vs fitted, difference vs mean or sum. On Wed, Oct 5, 2011 at 7:58 PM, Gabi Huiber <ghuiber@gmail.com> wrote: > Nick, thank you for this list. It's a useful refresher. > > Regarding 2 and 6: I didn't know that transparency was on a wish list, > but I'm glad to hear it is. I once saw a nice demonstration of ggplot2 > on r-blogger.com: markers of slightly less than 100% transparency > acted like disks of glass. One of them looks barely visible; the more > of them you stack, the darker the pile. This gives a very nice > gradient over a scattershot. It's prettier than the current > recommended workaround that we use hollow circles. > > Gabi > > On Fri, Sep 30, 2011 at 10:23 AM, Nick Cox <njcoxstata@gmail.com> wrote: >> Do you mean Vince Viggins? Sounds like a Dickens character. We saw >> Vince Wiggins at the London meeting. >> >> We are starting with these suggestions. I'll add numbers for convenience. >> >> 1. If one of the variables is positively skewed, consider plotting >> that axis on a log scale. >> >> 2. If there are a lot of data points (e.g., n > 1000), adopt a >> different strategy such as using some form of partial transparency, or >> sampling the data; >> >> 3. If one of the variables takes on a limited number of discrete >> categories, consider using a jitter or a sunflower plot; >> >> 4. If there are three or more variables, consider using a scatterplot matrix; >> >> 5. Fitting some form of trend line is often useful; >> >> 6. Adjust the size of the plotting character to the sample size (for >> bigger n, use a smaller plotting character); >> >> Random comments >> >> 1. I take this as standard. I'll add a plea for consideration of any >> reasonable non-linear scale, labelled in the original units! >> >> 2 and 6. Transparency is on some wishlists for Stata. With lots of >> data, you go not only for smaller symbols but more open ones and use >> lighter colors. >> >> 3. I've played with sunflower plots and gone off them. But if you want >> to try them, note that they are undocumented [sic] at -help twoway >> sunflower-. For highly discrete or even categorical variables, I like >> my -tabplot- (SSC). >> >> 4. Agree, although that does not rule some projection from a >> multivariate analysis being helpful too. >> >> 5. Yes, if "trend" means "smooth". Some special smooths were published in >> >> SJ-10-1 gr0021_1 . . . . . . . . . . Software update for doublesm and diagsm >> (help doublesm, diagsm, polarsm if installed) . . . . . . . N. J. Cox >> Q1/10 SJ 10(1):164 >> option to carry out smoothing using restricted cubic splines >> added to doublesm and diagsm >> >> SJ-5-4 gr0021 . . . . . . . Speaking Stata: Smoothing in various directions >> (help doublesm, diagsm, polarsm if installed) . . . . . . . N. J. Cox >> Q4/05 SJ 5(4):574--593 >> discusses exploratory tools for determining the structure >> of bivariate data >> >> Some possible additions: >> >> 7. About 1980, there was a sudden fashion for adding convex hulls, >> which faded away quickly. I remember often doing it with a pencil on >> lineprinter output. But Allan Reese has a nice implementation on SSC >> as -cvxhull-. On occasion that helps a lot. >> >> 8. When you have a categorical subdivision, try out both several >> categories superimposed and a -by()- option to give separate plots. A >> third strategy is given in >> >> SJ-10-4 gr0046 . . . . . . . . . . . . . . . Speaking Stata: Graphing subsets >> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox >> Q4/10 SJ 10(4):670--681 (no commands) >> explores graphical comparison of results for two or more >> subsets where each subset is plotted in a separate panel, >> with the rest of the data as a backdrop >> >> Nick >> >> On Fri, Sep 30, 2011 at 2:56 PM, Stas Kolenikov <skolenik@gmail.com> wrote: >> >>> There was an interesting question on data visualization on >>> Stats.StackExchange (http://stats.stackexchange.com/q/13148/5739): >>> what are the efficient strategies for tweaking scatterplots depending >>> on the data needs? Too much data make it clogged, too little data such >>> as ordinal make it too chunky, too skewed data makes it sit in one >>> corner, and there are a multitude of other things that needs to be >>> adjusted to make the display really informative. >>> >>> I would be especially curious to hear from Nick Cox and Michael >>> Mitchell, I guess, as the greatest contributors to Stata graphics (and >>> of course Vince V, but I don't think I've seen him on the list for a >>> while). * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**Re: st: visualization?***From:*Gabi Huiber <ghuiber@gmail.com>

- Prev by Date:
**st: RE: RE: rmanova or anova with repeated command, what to use?** - Next by Date:
**st: Speed of bsample and nested loops** - Previous by thread:
**Re: st: visualization?** - Next by thread:
**st: Speed of bsample and nested loops** - Index(es):