[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

AW: st: Tag clouds in Stata?

From   "Martin Weiss" <>
To   <>
Subject   AW: st: Tag clouds in Stata?
Date   Fri, 29 May 2009 11:42:21 +0200


Here is code for the necessary dataset, but how do I get -text()- to size according to the number of missings?

sysuse auto, clear
 tempname hdle
 tempfile info
 postfile `hdle' str20/* 
*/  variable missings using `info'
 qui ds, has(type numeric)
 qui foreach var in `r(varlist)'{
	/*want more missings 
	to make this interesting */ 
	replace `var'=. if runiform()<0.1
	qui cou if `var'==.
	post `hdle' ("`var'") (r(N))
 postclose `hdle'
use `info', clear
gen number=_n
list, noobs


-----Ursprüngliche Nachricht-----
Von: [] Im Auftrag von Maarten buis
Gesendet: Freitag, 29. Mai 2009 11:35
Betreff: Re: st: Tag clouds in Stata?

--- On Fri, 29/5/09, Gawrich Stefan wrote:
> many websites use tag clouds
> ( for
> different purposes nowadays. I think tag clouds could
> also be a useful tool for basic exploratory analysis in
> Stata. 
> Example 1 : Tag cloud of variable names (in alphabetical
> order or in order of the dataset): Fontsize represents
> e.g. the proportion of  missing values. 
> Example 2 : Tag clouds of values/value labels of a
> categorical var (with  many strata like postcodes in a
> region). Fontsize represents an aggregated score in each
> stratum.  
> They can help to spot errors or extreme data distributions
> in variables or subgroups or help to detect patterns.
> There are a lot of methodological issues and statistical
> options for visualisation of data in tag clouds.
> But the basic question is: Can it be done in Stata?
> Is there any routine to produce text in that way (like
> graph 3 or 4 in the Wikipedia article)?

You could create such a program. My first approach would be to 
create a blank graph by hiding the axes and axis labels (see:
-help axis options-), and than add the strings using the 
-text()- option (see -help -added text option-), and than
format the size of the individual strings within the -text()-

Question is: do you realy want to do that? I would have two
objections against this graph: 1) it encodes the information
you care about in the sizes (surface area) of the strings, 
and humans are pretty bad at decoding information in the 
form of areas. 2) The size of the string is not only a 
function of the information you want it to display, but also 
of the string itself: some words are longer than others and 
some letters look bigger (m) than others (i). 

For these reasons I would not be willing to write such a 
program, but that should not deter you. I am happy to be 
proven wrong, if that means that there is a new facilty 
around that is useful to some people.

Hope this helps,

Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen


*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index