# Re: st: catplot problem

 From n j cox To statalist@hsphsun2.harvard.edu Subject Re: st: catplot problem Date Tue, 06 May 2008 18:41:54 +0100

Tunga refers to a "problem" and declares his results to be "wrong".

There is no problem, except perhaps that Tunga is a little confused, or expecting a program to behave differently from what its creator intended.

Tunga is using -catplot- from SSC (and on the side -fre- from SSC).
Remember that you are asked to say where user-written program you refer to come from.

We do not have access to Tunga's data, but we can replicate his situation like this:

clear
input tunga freq
1. 1 4
2. 4 3
3. 5 6
4. 6 10
5. 7 16
6. 8 9
7. 9 1
8. 10 2
9. end

expand freq
tab tunga

tunga | Freq. Percent Cum.
------------+-----------------------------------
1 | 4 7.84 7.84
4 | 3 5.88 13.73
5 | 6 11.76 25.49
6 | 10 19.61 45.10
7 | 16 31.37 76.47
8 | 9 17.65 94.12
9 | 1 1.96 96.08
10 | 2 3.92 100.00
------------+-----------------------------------
Total | 51 100.00

Now if you go

histogram tunga, discrete freq

-histogram- shows not only bars for the populated categories 1 4/10 but also gaps with no bars, or bars of zero height, at 2 3.

If you go

catplot bar tunga

no gaps are shown, just bars for 1 4/10. This is puzzling Tunga.

-catplot- is designed, as the name implies and the help explains, for categorical data. As far as it is concerned, 1 4/10 are labels for categories that it shows in the sort order of the variable concerned. It has no consciousness of a gap at 2 3, any more than if you had data for "aardvark" "bison" and "elephants" it would insert categories at "cattle" and "donkeys". It is not designed as a clone of -histogram-, which would be futile as -histogram- already exists. -catplot- is not intended to show numerical variables on numerical scales.

There is an issue on the side of whether Tunga expects a graphics program to know what is going on in observations that have been expressly excluded as a consequence of -if-. There are some subtleties there, but the behaviour of -histogram- here does not reflect the fact that there are values for 2 3 in the rest of the dataset, as the above dataset will show. Rather, -histogram- draws a numeric axis over the range of the data and then shows (visible) bars where they belong.

It seems that Tunga thinks of his variable as numerical and there "should be" gaps at 2 3. If so, that's a view accommodated by -histogram-. Tunga will not get a satisfactory graph with -catplot-.

Note that -catplot bar- here is just a wrapper for -graph bar-, so the question is in a strong sense about the different views of -graph bar- and -histogram-. The behaviour you are seeing is what you would get with

graph bar freq, over(tunga)

and is not idiosyncratic to -catplot-.

Nick
n.j.cox@durham.ac.uk

Tunga Kantarci

I am having a problem with catplot.

First, I consider the following command:
fre q41a2vr1 if grandom == 1 & frandom == 2

q41a2vr1 -- Het plan van ^name5 q4 age randomisation 1plan 1 rating 3/4
----------------------------------------------------------------------
| Freq. Percent Valid Cum.
-------------------------+--------------------------------------------
Valid 1 helemaal niks | 4 2.78 7.69 7.69
4 | 3 2.08 5.77
13.46
5 | 6 4.17 11.54
25.00
6 | 10 6.94 19.23
44.23
7 | 16 11.11 30.77
75.00
8 | 9 6.25 17.31
92.31
9 | 1 0.69 1.92
94.23
10 ideaal | 3 2.08 5.77 100.00
_______________________________________________

Second, I consider the histogram regarding this command:
hist q41a2vr1 if grandom == 1 & frandom == 2, discrete freq

Also note that for q41a2vr1 I have 10 values from "1 helemaal niks" to "10
ideaal":

fre q41a2vr1
q41a2vr1 -- Het plan van ^name5 q4 age randomisation 1plan 1 rating 3/4
----------------------------------------------------------------------
| Freq. Percent Valid Cum.
-------------------------+--------------------------------------------
Valid 1 helemaal niks | 19 0.95 4.53 4.53
2 | 6 0.30 1.43 5.97
3 | 11 0.55 2.63 8.59
4 | 29 1.45 6.92 15.51
5 | 53 2.64 12.65 28.16
6 | 85 4.24 20.29 48.45
7 | 107 5.34 25.54 73.99
8 | 73 3.64 17.42 91.41
9 | 18 0.90 4.30 95.70
10 ideaal | 18 0.90 4.30 100.00
Total | 419 20.91 100.00
Missing . | 1585 79.09
Total | 2004 100.00
----------------------------------------------------------------------

As seen in the first table and as I see in the histogram no bars, there are
no observations to correspond to "2" and "3".

Now, I run the following command:

catplot bar q41a2vr1 if grandom == 1 & frandom == 2
(alternatively I run the following command, but then check for frandom == 2
in the graph. These are same things: catplot bar frandom q41a2vr1 if grandom
== 1)

The problem arises here: Catplot does not give blanks for "2" and "3" that
the q41a2vr1 variable takes.
And it (although it starts correctly with "1" the q412vr1 takes) takes the
value of
"4" in the histogram and puts it for "2" in the catplot,
"5" in the histogram and puts it for "3" in the catplot,
"6" in the histogram and puts it for "4" in the catplot,
"7" in the histogram and puts it for "5" in the catplot,
"8" in the histogram and puts it for "6" in the catplot,
"9" in the histogram and puts it for "7" in the catplot,
"10" in the histogram and puts it for "8" in the catplot.

Hence what catplot shows is wrong.
It slides what is on the right in the histogram to the left in the catplot.
How can I solve this problem?

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/