Re: st: Table: Summary Statistics

 From "Austin Nichols" To statalist@hsphsun2.harvard.edu Subject Re: st: Table: Summary Statistics Date Wed, 23 May 2007 08:17:09 -0400

```Ana--
In the class of exploratory graphs Nick describes, one good option
compares the distributions of variables across two years or two
countries using density estimates constructed with e.g. -kdens- and,
where appropriate, a log scale, like so:

ssc inst kdens
webuse nlswork
g log=ln(tenure)
g zero=(tenure<=0) if tenure<.
su log
gen lx=(_n-1)/49*(r(max)-r(min))+r(min) in 1/50
gen x=exp(lx)
la var x "Tenure"
g weight=1
kdens log if collg==1 & year==88 [pw=weight], at(lx) gen(c) nogr
su zero if collg==1& year==88 [aw=weight], meanonly
local nz1: di %3.2f 100*(1-r(mean))
la var c "Coll grads, for the `nz1'% with nonzero expenditures "
kdens log if collg==0 & year==88 [pw=weight], at(lx) gen(nc) nogr
su zero if collg==0 & year==88 [aw=weight], meanonly
local nz0: di %3.2f 100*(1-r(mean))
la var nc "Not coll grads, for the `nz0'% with nonzero expenditures"
line c nc x, xscal(log) leg(order(1 - 2))

You could also plot the proportion at zero or below (not plotted on a
log scale) as a point mass in such a graph on the y-axis rule, but
this is an incorrect location for the point mass, and it is a bit
strange to compare densities and point masses simultaneously, so I
prefer just to put the relevant proportion in text somewhere, as in
the example.

On 5/22/07, n j cox <n.j.cox@durham.ac.uk> wrote:
```
```You describe your data clearly, but inevitably I
don't have a clear idea of your research questions
or your research style. I would do lots of graphs
and if I were doing any classical tests I would
want to cross-check with some non-parametric tests
and/or analyses using logarithmic scales for highly
skewed variables. On the latter, you would need
to watch out for zeros.

Ana R. Rios
Given your observation of highly skewed distributions,
I was wondering what kind of summary statistics and
transformed scales would be appropriate?

n j cox

> to your means, and it is evident that you
> have highly skewed distributions. Printing
> both to 5 or 6 significant figure is no
> doubt part of what is expected, but their
> utility is moot.
```

```Ana R. Rios
> I am trying to build a table with summary statistics
> (mean and standard deviation) as follows:
>             Tanzania 1992-93         Tanzania
> 1993-94
> Harvest              1708.47                  1254.36
>                 (1454.64)                 (926.05)
>
```
```*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```