Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Adding Reference Bars to Bar Graph
William Buchanan <firstname.lastname@example.org>
Re: st: Adding Reference Bars to Bar Graph
Mon, 20 Aug 2012 09:57:33 -0700
Sorry for the delayed response. The expand example would work if I only needed a single reference bar for the entire variable. However, for the graph that I am trying to recreate, I found a solution that works well. First I needed to generate additional observations that I could use to append to the dataset that would have the appropriate balance of categorical responses for each of the six variables:
set obs 100
g school="Ref Cat"
g item1=1 if _n<=21
replace item1=2 if _n>=22 & _n<=44
replace item1=3 if _n>=45 & _n<=77
replace item1=4 if _n>=78
g item2=1 if _n<=32
replace item2=2 if _n>=33 & _n<=60
replace item2=3 if _n>=61 & _n<=88
replace item2=4 if _n>=89
g item3=1 if _n<=38
replace item3=2 if _n>=39 & _n<=64
replace item3=3 if _n>=65 & _n<=87
replace item3=4 if _n>=88
use masterdata, clear
append using `refcat'
replace observationtime=9999 if school=="Ref Cat"
catplot observationtime if school=="Some School" | school=="Ref Cat", over(item1) asyvar recast(bar) blabel(total, format(%2.0f)) percent(schoolyr) bar(1,color(green*.85) lc(black) lw(medthin)) bar(2, color(blue*.325) lc(black) lw(medthin)) ysc(r(0(10)100) lc(black)) ylab(0(10)100, angle(0) glc(black) tlw(vvthin) glw(vvthin)) yti(Percent) legend(label(1 "This School Yr 2 (N=84)") label(2 "This School Yr 3 (N=116)") label(3 "Reference Categories") pos(12) span region(lc(white))) ti("Survey Item Here", color(black)) b1title(" ") graphr(fc(white) color(white) ic(white))
By setting the observationtime variable to some unreasonably high number for the reference category school, I'm able to put the reference category all the way to the right in every graph that I'll need to create and display the appropriate percentage for each of the four categorical responses. So, problem solved. Now to hopefully get some more efficient data practices implemented here so I can make things work a bit more easily.
On Aug 17, 2012, at 6:11 AM, Nick Cox wrote:
> Two thoughts:
> First off, it is usually a feature of -catplot- that it does the
> calculations for you and then draws the graph, but here it sounds like
> a mixed blessing as you want to combine one set of results calculated
> one way with another set of results calculated another way. I suspect
> that you may need to back up, concentrate on producing the resultsset
> you want to show and then and only then pass it to -graph bar (asis)-.
> Second, I don't understand what the difficulty is with -expand-, but
> if you have a way of getting the other results into a dataset with
> similar structure an alternative is to -append-.
> On Fri, Aug 17, 2012 at 1:54 PM, William Buchanan
> <email@example.com> wrote:
>> Hi Nick,
>> I apologize for not clearly explaining things previously (although the solution you suggested would have worked great if I could use it with this problem). I've added an example of what the charts that this company is currently using look like here: https://www.dropbox.com/sh/9g5b6h1ul1ev9br/HTD6u678Er.
>> I should probably also provide a bit more information about the data itself since that could be helpful. The dataset contains several hundred different firms, each with varying observations per period and varying periods of observation; for example, firm 1 may have only been measured once in 2007 and had 20 observations, but firm 2 may have been measured twice (once in the spring and once in the fall of 2007) and had 13 observations for the first measurement and 40 observations for the second measurement. In the example that I had sent and was working on at first I was ignoring whether the firm was observed more than once in a single year. The "norm avg" is based on a sample of first time observations between 2004-2009 and is fixed for all firms/periods. The thing that is difficult here is that this average is based on values of each of the four categorical values (e.g., there is a norm avg for each of the categories that appears on the chart).
>> What I'm hoping to achieve for the end goal is a way to consistently and quickly produce the 600+graphs (there are six graphs like this that need to be created for each firm with a different measurement but same scale for each). Given that each firm has a different number of observations per period/year, it would be very difficult to use the expand option (since the percentages for each of the norm averages needs to be the same across all firms).
>> Thanks again,
>> On Aug 17, 2012, at 1:57 AM, Nick Cox wrote:
>>> The only way ahead given that you have taken this road is to add a
>>> further category to the data (certainly not a further variable).
>>> The following may be suggestive of the main trick involved. I have
>>> kept the example as simple as possible.
>>> sysuse auto
>>> expand 2 in L
>>> replace foreign = 2 in L
>>> * whatever constant you need
>>> replace mpg = 42 in L
>>> label def origin 2 "Total" , add
>>> graph bar (mean) mpg, over(foreign)
>>> The problem with -catplot- here that it is based (in your case) on
>>> -graph bar-, which does not behave like -twoway- in letting other
>>> graphs be superimposed. So you must mess, at least temporarily, with
>>> the data.
>>> The alternative is to rewrite everything as a call to -twoway bar-,
>>> but even in that case the easiest way may be to start with an
>>> augmented dataset.
>>> On Fri, Aug 17, 2012 at 5:33 AM, William Buchanan
>>> <firstname.lastname@example.org> wrote:
>>>> I'm trying to automate production of a series of bar graphs that show the percentage of values and am having difficulty adding an extra bar with a constant value to the graph. With the exception of this additional bar, I've managed to get something close to the current graphs being used with -catplot- (available from SSC; written by Nick Cox). The syntax that I've used so far (with only slight modification) is:
>>>> catplot schoolyr, over(skills) asyvar recast(bar) blabel(total, format(%2.0f)) percent(schoolyr) bar(1,color(green*.85) lc(black) lw(medthin)) bar(2, color(blue*.325) lc(black) lw(medthin)) ysc(r(0(10)100) lc(black)) ylab(0(10)100, angle(0) glc(black) tlw(vvthin) glw(vvthin)) yti(Percent) legend(label(1 "XXXXX Yr 2 (N=84)") label(2 "XXXXX Yr 3 (N=116)") pos(12) span region(lc(white))) ti("Survey Item Here", color(black)) b1title(" ") graphr(fc(white) color(white) ic(white))
>>>> My goal is to get an additional bar added to the graph with values showing the percentage of "overall". In the actual dataset, the "overall" measure is a time-invariant constant and the firm wants to keep the same graphics formatting that they have already used. I've tried messing around with the -over()- options and by adding the additional variable to the -varlist- in the command. In each of these cases, there is a change to the x-axis (instead of showing an additional bar, the bars are disaggregated by the additional variable). I've provided syntax to generate something close to the data that I am working with. The only difference between the syntax above and what I will eventually need to work with is the addition of an -if- statement that will be used to loop over the firms in the dataset.
>>>> Thanks in advance for any assistance,
>>>> set obs 200
>>>> set seed 10101
>>>> g schoolyr=2 if _n<=84
>>>> replace schoolyr=3 if schoolyr==.
>>>> g sk=runiform()
>>>> egen skills=cut(sk), group(4)
>>>> sort skills
>>>> g overall=1 if _n<=25
>>>> replace overall=2 if _n<=50 & _n>=26
>>>> replace overall=3 if _n<=75 & _n>=51
>>>> replace overall=3 if _n>=76
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
* For searches and help try: