Ana--
In the class of exploratory graphs Nick describes, one good option
compares the distributions of variables across two years or two
countries using density estimates constructed with e.g. -kdens- and,
where appropriate, a log scale, like so:
ssc inst kdens
webuse nlswork
g log=ln(tenure)
g zero=(tenure<=0) if tenure<.
su log
gen lx=(_n-1)/49*(r(max)-r(min))+r(min) in 1/50
gen x=exp(lx)
la var x "Tenure"
g weight=1
kdens log if collg==1 & year==88 [pw=weight], at(lx) gen(c) nogr
su zero if collg==1& year==88 [aw=weight], meanonly
local nz1: di %3.2f 100*(1-r(mean))
la var c "Coll grads, for the `nz1'% with nonzero expenditures "
kdens log if collg==0 & year==88 [pw=weight], at(lx) gen(nc) nogr
su zero if collg==0 & year==88 [aw=weight], meanonly
local nz0: di %3.2f 100*(1-r(mean))
la var nc "Not coll grads, for the `nz0'% with nonzero expenditures"
line c nc x, xscal(log) leg(order(1 - 2))
You could also plot the proportion at zero or below (not plotted on a
log scale) as a point mass in such a graph on the y-axis rule, but
this is an incorrect location for the point mass, and it is a bit
strange to compare densities and point masses simultaneously, so I
prefer just to put the relevant proportion in text somewhere, as in
the example.
On 5/22/07, n j cox <n.j.cox@durham.ac.uk> wrote:
You describe your data clearly, but inevitably I
don't have a clear idea of your research questions
or your research style. I would do lots of graphs
and if I were doing any classical tests I would
want to cross-check with some non-parametric tests
and/or analyses using logarithmic scales for highly
skewed variables. On the latter, you would need
to watch out for zeros.
Ana R. Rios
Given your observation of highly skewed distributions,
I was wondering what kind of summary statistics and
transformed scales would be appropriate?
n j cox
> Not your question, but your sds are close
> to your means, and it is evident that you
> have highly skewed distributions. Printing
> both to 5 or 6 significant figure is no
> doubt part of what is expected, but their
> utility is moot.
Ana R. Rios
> I am trying to build a table with summary statistics
> (mean and standard deviation) as follows:
> Tanzania 1992-93 Tanzania
> 1993-94
> Harvest 1708.47 1254.36
> (1454.64) (926.05)
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/