Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Table: Summary Statistics

From   "Austin Nichols" <>
Subject   Re: st: Table: Summary Statistics
Date   Wed, 23 May 2007 08:17:09 -0400

In the class of exploratory graphs Nick describes, one good option
compares the distributions of variables across two years or two
countries using density estimates constructed with e.g. -kdens- and,
where appropriate, a log scale, like so:

ssc inst kdens
webuse nlswork
g log=ln(tenure)
g zero=(tenure<=0) if tenure<.
su log
gen lx=(_n-1)/49*(r(max)-r(min))+r(min) in 1/50
gen x=exp(lx)
la var x "Tenure"
g weight=1
kdens log if collg==1 & year==88 [pw=weight], at(lx) gen(c) nogr
su zero if collg==1& year==88 [aw=weight], meanonly
local nz1: di %3.2f 100*(1-r(mean))
la var c "Coll grads, for the `nz1'% with nonzero expenditures "
kdens log if collg==0 & year==88 [pw=weight], at(lx) gen(nc) nogr
su zero if collg==0 & year==88 [aw=weight], meanonly
local nz0: di %3.2f 100*(1-r(mean))
la var nc "Not coll grads, for the `nz0'% with nonzero expenditures"
line c nc x, xscal(log) leg(order(1 - 2))

You could also plot the proportion at zero or below (not plotted on a
log scale) as a point mass in such a graph on the y-axis rule, but
this is an incorrect location for the point mass, and it is a bit
strange to compare densities and point masses simultaneously, so I
prefer just to put the relevant proportion in text somewhere, as in
the example.

On 5/22/07, n j cox <> wrote:
You describe your data clearly, but inevitably I
don't have a clear idea of your research questions
or your research style. I would do lots of graphs
and if I were doing any classical tests I would
want to cross-check with some non-parametric tests
and/or analyses using logarithmic scales for highly
skewed variables. On the latter, you would need
to watch out for zeros.

Ana R. Rios
Given your observation of highly skewed distributions,
I was wondering what kind of summary statistics and
transformed scales would be appropriate?

n j cox

 > Not your question, but your sds are close
 > to your means, and it is evident that you
 > have highly skewed distributions. Printing
 > both to 5 or 6 significant figure is no
 > doubt part of what is expected, but their
 > utility is moot.

Ana R. Rios
 > I am trying to build a table with summary statistics
 > (mean and standard deviation) as follows:
 >             Tanzania 1992-93         Tanzania
 > 1993-94
 > Harvest              1708.47                  1254.36
 >                 (1454.64)                 (926.05)
*   For searches and help try:

© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index