Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: How to label bars with frequency AND percentage for categorical variables?


From   Austin Nichols <austinnichols@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: How to label bars with frequency AND percentage for categorical variables?
Date   Wed, 25 Nov 2009 10:45:11 -0500

One way to proceed:
-preserve-, -uselabel-, loop over (labeled) values to construct a
local macro with relevant info, then -restore- and use that info to
add fake observations to the end of your data which you will later
drop.  -egen group, label- is a good way to make a new plotting
variable that takes on integer values, so for example codes
1,2,3,96,98 are mapped to 1,2,3,4,5 with appropriate value labels, but
you want to use that after you add your fake obs.

On Wed, Nov 25, 2009 at 10:33 AM, Nick Cox <n.j.cox@durham.ac.uk> wrote:
> You could do that. You would need to look up any value labels that were attached and add these to one axis as further labelled points on that axis. You wouldn't need to change any data even on the fly as by definition such categories correspond to bars of zero length, which need not be plotted. (Stata wouldn't do it, any way.)
>
> Showing them at their "correct" positions would be more awkward. For example, if 1,2,3,4,5 are defined but 3 is absent, -mybar- doesn't leave a gap and arranging that did would, I guess, require some moderate rewriting of the code. (Taking numbers assigned to categories literally is usually a bad idea because you can get the effect that you referred to whereby categories 96 and 98 cause enormous gaps.)
>
> Then again, the program would perhaps be better showing categories in order of frequency, and then the categories that exist in principle but not in practice would naturally go at one end of the axis.
>
> The name -mybar- has attached to it a spell such that once you use the program it's yours.
>
> Nick
> n.j.cox@durham.ac.uk
>
> Adam Collins
>
> Quick follow up question... do you know of a way of getting this graph to display all the value categories (ie. all values that have a label assigned), even if there are zero occurrences of the value in the data set?
>
> Currently, the only way I have found of tabulating all labelled categories is by using -fre- (from SSC):
>
>        ssc install fre
>        fre varlist, includelabeled
>
> ... but I can't work out how to get this information into the graph.
>
> Adam Collins
>
> Scott, Nick - thank you both so much!
>
> Scott: your solution works well for categorical variables with category values close together, however it ran into problems with variables that have "gaps" between the category values - for example, my data set contains standard codes for "other" and "don't know" (96 and 98 respectively).
>
> Nick: The -mybar- solution copes fine with all the category values, and moving the text differently for different numbers of category (including the proportional `nudge' for fewer than 12 categories) makes it very robust.  I especially like the fact the value labels and variable title are already displayed.  Once -mybar- is "installed", the simplicity of use is quite beautiful!
>
> Thanks again!  This is really helpful!!
>
> Nick Cox
>
> I'll add my solution without much detailed comment. Independently of this I wrote a program using very similar logic.
>
> I agree with Scott's tacit position: for this, you need to retreat from -graph bar- or -graph hbar- and work out your own solution using -twoway bar-, if only because -graph *bar- does not seem to offer scope for customised bar labels. (By extension, neither does -catplot-. -catplot- is from SSC. Please remember to explain where user-written programs you refer to come from.)
>
> Much of the complication in the code arises because if the number of categories is even moderate, you run out of space to put the frequency atop the percent.
>
> program mybar
>        version 9
>        syntax varname [if] [in] [, baropts(str asis) sc1opts(str asis) *]
>
>        quietly {
>                marksample touse, strok
>                count if `touse'
>                if r(N) == 0 error 2000
>
>                preserve
>                tempvar fr pc text y y1 y2
>
>                contract `varlist' if `touse', freq(`fr') percent(`pc')
>                egen `y' = group(`varlist'), label
>                su `fr', meanonly
>                local xmax = 1.15 * r(max)
>
>                if _N > 12 {
>                        gen `text' = string(`fr') + " (" ///
>                        + string(`pc', "%2.1f") + "%)"
>                        local textcall ///
>                scatter `y' `fr', ms(none) mla(`text') ///
>                mlabc(green) yti("")
>                }
>                else {
>                        local nudge = 0.02 * _N
>                        gen `y1' = `y' - `nudge'
>                        gen `y2' = `y' + `nudge'
>                        gen `text' = "(" + string(`pc', "%2.1f") + "%)"
>                        local textcall ///
>                scatter `y1' `fr', ms(none) mla(`fr') mlabc(green) `sc1opts' ///
>                || scatter `y2' `fr', ms(none) mla(`text') mlabc(green)
>                }
>
>                levelsof `y', local(Y)
>                local header `"`: var label `varlist''"'
>                if `"`header'"' == "" local header "`varlist'"
>        }
>
>        twoway bar `fr' `y', horizontal barw(0.6) base(0) `baropts' ///
>        || `textcall'  ///
>        yla(`Y', valuelabels noticks ang(h)) ysc(reverse) ///
>        xti("") xla(none) xsc(r(0 `xmax')) ///
>        legend(off) subtitle(`"`header'"') `options'
> end
>
> Nick
> n.j.cox@durham.ac.uk
>
> Scott Merryman
>
> Something like this?
>
>
> sysuse auto, clear
> xtile cat_mpg = mpg, nq(4)
>
> foreach var of varlist  rep  cat {
>        qui {
>        count if `var' != .
>        local total =r(N)
>        egen count = count(`var') if `var' !=., by(`var')
>        gen percent = string(round((count/`total')*100,.1)) + "%)"
>        replace percent = "("+percent
>        gen count2= count+.5
>        gen rep1 = `var' - .2
>        gen rep2 = `var' + .1
>        }
>        twoway bar count `var', barw(0.7)    || ///
>                scatter count2 rep1, mlabel(count) mlabpos(0)  ///
>                ms(none) mlabcolor(black) || ///
>                scatter count2 rep2, mlabel(percent) mlabpos(0) ///
>                ms(none) legend(off)  mlabcolor(black) ///
>                name(gr_`var',replace)
> drop count* percent rep?
> }
>
>
> See also:
> http://www.stata.com/support/faqs/data/percentvars.html
>
> On Tue, Nov 24, 2009 at 9:06 AM, Adam Collins <ACollins@fxb.org> wrote:
>
>> I have a series of categorical variables.  For each one, I would like to create a quick bar graph that displays the count (frequency) of each category, but also the percentage.
>>
>> For example, if it was an "hbar" graph, it might look something like this:
>>
>> __________
>>          | 10
>> __________|(33.3%)
>> _________________
>>                 | 16
>> _________________|(53.3%)
>> ____
>>    | 4
>> ____|(13.3%)
>>
>>
>> I intend to use a "foreach" loop to iterate through the list of categorical variables, so I am looking for a solution that can be automated for each variable in my list.  I don't mind if the solution uses catplot or hbar or something else.
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index