[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: Weights with -table- and -tabulate- |

Date |
Wed, 18 Dec 2002 14:27:10 -0000 |

Friedrich Huebler > > I have two questions on the use of weights with -table- and > -tabulate-. > > (1) Can the frequencies be rounded when -tabulate- is used with > weights. My weight looks like this: > > Variable | Obs Mean Std. Dev. Min Max > ---------+----------------------------------------------------- > sweight | 28791 1.004766 .127654 .787363 1.231606 > > The command > > tab male [aw=sweight] > > yields this table: > > Male | Freq. Percent Cum. > ------+----------------------------------- > 0 | 14893.9698 51.73 51.73 > 1 | 13897.0302 48.27 100.00 > ------+----------------------------------- > Total | 28791 100.00 > > I prefer the frequencies to be shown as 14894 and 13897. Can this be > done? > > (2) Why has the weight no effect on the output of -table-? > > When I type > > table male [aw=sweight] > > I get this table: > > ---------------------- > Male | Freq. > ----------+----------- > 0 | 14,862 > 1 | 13,929 > ---------------------- > > This is the same as the unweighted frequency distribution. > > I looked at the manuals, the FAQs, and the list archive and found no > answer to these questions. I use Stata 7. Interesting! The sequence here seems to be that Friedrich can get the results he wants in -tabulate-, but not the format, and he can get the format he wants in -table-, but not the results. I don't have Friedrich's data, so I will use the auto data to illustrate a reply, although there is perhaps one unresolved question here which only Stata Corp can answer definitively: why the difference in behaviour? Here it is shown with the auto data: . tab foreign [aw=mpg] Car type | Freq. Percent Cum. ------------+----------------------------------- Domestic | 48.4098985 65.42 65.42 Foreign | 25.5901015 34.58 100.00 ------------+----------------------------------- Total | 74 100.00 . table foreign [aw=mpg] ---------------------- Car type | Freq. ----------+----------- Domestic | 52 Foreign | 22 ---------------------- Here's my way of thinking about it. Suppose I say to you: count the categories of -foreign-, given these values of -mpg- as weights. One interpretation -- that taken by -table- -- is that the weights are irrelevant. -table- will count for you, but the weights don't enter into _counting_. And in many contexts, we do want the raw frequencies, unweighted, and also other statistics weighted by something. This is perhaps startling, and I think should be better documented, but I don't think it is a bug. If you also say: give the mean of -weight-, then Stata pays attention to -mpg- supplied as weight. (Incidentally, -tabstat- behaves the same way.) There is a clear difference between . table foreign , c(freq mean weight) -------------------------------------- Car type | Freq. mean(weight) ----------+--------------------------- Domestic | 52 3,317.1 Foreign | 22 2,315.9 -------------------------------------- and . table foreign [aw=mpg] , c(freq mean weight) -------------------------------------- Car type | Freq. mean(weight) ----------+--------------------------- Domestic | 52 3,174.2 Foreign | 22 2,240.6 -------------------------------------- The other interpretation -- that taken by -tabulate- -- is that you want -- as you evidently do -- a list of sum of weights in category / mean of weights overall which has the property that it sums to the total frequency. You want to see that, but formatted in the way you want. I don't think -tabulate- can do this. It has no -format()- option, and it pays no attention to variable display formats when showing frequencies. In addition, -tabulate- can show all sorts of different results and it is not clear that the same format would ever be appropriate for all. (One answer to that would be to allow multiple formats via more complicated syntax.) One remedy is to calculate directly what you want to show and then show it with -tabdisp-. -tabdisp- is documented at [P] tabdisp as if it were an only-for-the-technical command, but it is very useful interactively as well. (-foreach- and -forvalues- fall into the same category.) Elsewhere another tabulation problem otherwise awkward has been shown to yield to some calculations and -tabdisp-. See How do I tabulate cumulative frequencies? http://www.stata.com/support/faqs/data/tabdisp.html Here is a laboured way of doing it. It has one advantage. I may not be the only person who -- even though the manipulations here are elementary -- can get confused in this terrain unless I write down the formulas and play with simple examples step by step, and this route takes you where you want to be in very easy stages. I will go through a basic sequence and then make some comments. 1. We want the sum of weights in each category . egen sumw = sum(mpg) , by(foreign) 2. We want the mean of weights overall . egen meanw = mean(mpg) 3. Our weighted frequencies are then just . gen freq = sumw/meanw 4. By construction, these are constant within each category, so -tabdisp- is easy . tabdisp foreign, c(freq) ---------------------- Car type | freq ----------+----------- Domestic | 48.4099 Foreign | 25.5901 ---------------------- 5. And we can control the format: . tabdisp foreign, c(freq) format(%2.0f) ---------------------- Car type | freq ----------+----------- Domestic | 48 Foreign | 26 ---------------------- Comment A ========= We could do this via -table- but it is not as nice: . table foreign , c(mean freq) format(%2.0f) ---------------------- Car type | mean(freq) ----------+----------- Domestic | 48 Foreign | 26 ---------------------- or (among other possibilities) . table foreign , c(min freq) format(%2.0f) ---------------------- Car type | min(freq) ----------+----------- Domestic | 48 Foreign | 26 ---------------------- Comment B (especially for (budding) programmers) ========= In other circumstances, I would be the first to squawk at code like . egen sumw = sum(mpg) , by(foreign) . egen meanw = mean(mpg) . gen freq = sumw/meanw if within a program, as it is wasteful of memory and slow. In a program, you shouldn't use -egen- at all. In any case, putting a constant in a variable is bad style. The code above is used to make it as clear as possible what is being done. For efficiency, we could first go . egen freq = sum(mpg), by(foreign) . su mpg, meanonly . replace freq = freq / r(mean) and then get rid of the -egen- (which makes the code longer, but faster). Comment C ========= If you were doing this a lot, as a convenience you might like a single function to calculate the weighted frequencies. This doesn't seem to have been done, so I have written an -egen- function -wtfreq()- which will be added to -egenmore- on SSC. Nick n.j.cox@durham.ac.uk * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**st: Re: Weights with -table- and -tabulate-***From:*Friedrich Huebler <huebler@rocketmail.com>

**References**:**st: Weights with -table- and -tabulate-***From:*Friedrich Huebler <huebler@rocketmail.com>

- Prev by Date:
**st: RE: displaying date but also the time!** - Next by Date:
**st: Categorical dependent variables and large dummy variable data sets** - Previous by thread:
**st: Weights with -table- and -tabulate-** - Next by thread:
**st: Re: Weights with -table- and -tabulate-** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |