Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Statsby and weights


From   jpitblado@stata.com (Jeff Pitblado, StataCorp LP)
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Statsby and weights
Date   Thu, 25 Mar 2004 13:08:16 -0600

Dale Plummer <dale.plummer@vanderbilt.edu> asks about using weights with
-statsby-:

> I may have overlooked something obvious, but I cannot see why the
> statsby command will not allow weights in the commands it is executing.
> Would someone please explain this?

There really isn't a good reason for this.  From a development point of view,
-statsby- uses the same parsing engine as -bootstrap-, -jknife-, -simulate-,
and -permute-; some of which require careful consideration (and new code) to
handle weights.

There are ways around this.  The long way is to set up -postfile- and use
-post- within a -forvalues- loop.  This requires a decent amount of coding to
reproduce some of the features of -statsby-.

The short way, involves tricking -statsby-.  I generally would warn users
against trying to "trick" a command to do something that a developer purposely
tried to prevent, but this is one of those special cases.

Suppose we want to use fweights with -summarize- for each category of a
variable.  The unweighted version would be

	. sysuse auto
	(1978 Automobile Data)
	
	. statsby "sum mpg" r(mean), by(rep)
	
	command:      sum mpg
	statistic:    _stat_1    = r(mean)
	by:           rep78
	
	. list
	
	     +------------------+
	     | rep78    _stat_1 |
	     |------------------|
	  1. |     1         21 |
	  2. |     2     19.125 |
	  3. |     3   19.43333 |
	  4. |     4   21.66667 |
	  5. |     5   27.36364 |
	     +------------------+

As already noted, -statsby- does not like weights to be specified:

	. capture noisily statsby "sum mpg [fw=1]" r(mean), by(rep)
	weights not allowed

We could write a wrap-around command for -summarize- that took weights in a
different way:

	program mysum
		syntax varlist [if] [in] [, weight(string) * ]
		sum mpg `if' `in' [`weight'], `options'
	end

Now we can pass weights to -summarize- using -mysum-'s -weight()- option.
Here we'll specified an -fweight- of one to check the result with the
unweighted version:

	. sysuse auto
	(1978 Automobile Data)
	
	. statsby "mysum mpg, weight(fw=1)" r(mean), by(rep)
	
	command:      mysum mpg , weight(fw=1)
	statistic:    _stat_1    = r(mean)
	by:           rep78
	
	. list
	
	     +------------------+
	     | rep78    _stat_1 |
	     |------------------|
	  1. |     1         21 |
	  2. |     2     19.125 |
	  3. |     3   19.43333 |
	  4. |     4   21.66667 |
	  5. |     5   27.36364 |
	     +------------------+

Now let's really specify some weights:

	. statsby "mysum mpg, weight(fw=turn)" r(mean), by(rep)
	
	command:      mysum mpg , weight(fw=turn)
	statistic:    _stat_1    = r(mean)
	by:           rep78
	
	. list
	
	     +------------------+
	     | rep78    _stat_1 |
	     |------------------|
	  1. |     1   20.92683 |
	  2. |     2   18.97983 |
	  3. |     3   19.11445 |
	  4. |     4    21.1342 |
	  5. |     5   27.19898 |
	     +------------------+
	
We can verify the weights were specified by looking at the results on a
group-by-group basis:

	. sysuse auto
	(1978 Automobile Data)
	
	. sum mpg [fw=turn] if rep==1
	
	    Variable |       Obs        Mean    Std. Dev.       Min        Max
	-------------+--------------------------------------------------------
	         mpg |        82    20.92683    3.017564         18         24
	
	. sum mpg [fw=turn] if rep==2
	
	    Variable |       Obs        Mean    Std. Dev.       Min        Max
	-------------+--------------------------------------------------------
	         mpg |       347    18.97983    3.466128         14         24
	
	. sum mpg [fw=turn] if rep==3
	
	    Variable |       Obs        Mean    Std. Dev.       Min        Max
	-------------+--------------------------------------------------------
	         mpg |      1232    19.11445    4.018323         12         29
	
	. sum mpg [fw=turn] if rep==4
	
	    Variable |       Obs        Mean    Std. Dev.       Min        Max
	-------------+--------------------------------------------------------
	         mpg |       693     21.1342    4.836715         14         30
	
	. sum mpg [fw=turn] if rep==5
	
	    Variable |       Obs        Mean    Std. Dev.       Min        Max
	-------------+--------------------------------------------------------
	         mpg |       392    27.19898    8.349844         17         41


As a final note, let me just warn against using this trick with -bootstrap-,
-permute-, and -jknife-.  The result will most definitely not be what you
would expect.

--Jeff
jpitblado@stata.com
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index