Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: pweight + aweight, double weights


From   Steve Samuels <sjsamuels@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: pweight + aweight, double weights
Date   Thu, 5 Aug 2010 13:08:12 -0400

The two methods give identical results because they are algebraically
equivalent:

For the pweighted mean with pwt2 = length_1 x pwt.
and rate = (length_2 - length_1):

pwt2--weighted mean of rate = (sum of pwt2 x rate)/(sum of pwt2)

The numerator is:
   sum of pwt x length_1 x (length_2 - length_1)/length_1
= sum of pwt x (length_2 - length_1)
= sum of pwt x length_2   minus sum of pwt x length_1
= (pwt--weighted sum of length_2) minus  (pwt-weighted sum of length_1)

The denominator is:
   sum of pwt x length_1
= pwt-weighted sum of length_1


Steve
On Thu, Aug 5, 2010 at 9:23 AM, Steve Samuels <sjsamuels@gmail.com> wrote:
> Jochen, the totals you used in the -display- lines are different from
> those produced by the first -table- statement.  When I use the
> latter, the results of the two methods are identical.
>
> Steve
>
> **************************CODE BEGINS**************************
> sysuse auto, clear
> gen double length_2 = displacement
> rename length length_1
> rename trunk pwt
> * Look up the pweighted sums of length_1 and length_2 for foreign and
> domestic cars:
> table foreign [pw= pwt], c(sum length_1 sum length_2)
>
> di  "Domestic: "  (190108 - 153917)/153917
> di " Foreign:  "  (28194  -  42450)/42450
>
> * Look up the growth rates based on the aggregate sums of lenght_1 and length_2:
>
> * Do a pweighted mean of the individual growth rated with pw = inital
> value x pweight:
> gen double pwt2 = length_1*pwt
> cap drop rate
> gen double rate = (length_2 - length_1) / length_1
> table foreign [pweight = pwt2], c(mean rate)
> ***************************CODE ENDS***************************
>
>
> On Thu, Aug 5, 2010 at 4:40 AM, Jochen Späth <jochen.spaeth@iaw.edu> wrote:
>> Hi Steve,
>>
>> thanks for your little program. What I do not understand is your statement that with a "probability weighted mean of the  individual growth rates" I "would wind up with the rate based on the probability-weighted aggregated sums". Check out this:
>>
>> **************************CODE BEGINS**************************
>> sysuse auto, clear
>> gen length_2 = displacement
>> rename length length_1
>> rename trunk pw
>>
>> * Look up the pweighted sums of length_1 and length_2 for foreign and domestic cars:
>>
>> table foreign [pw= pw], c(sum length_1 sum length_2)
>>
>> * Look up the growth rates based on the aggregate sums of lenght_1 and length_2:
>>
>> di "domestic:" (311319 - 270137 ) / 270137
>> di "foreign:"  (155268 - 235051) / 235051
>>
>> * Do a pweighted mean of the individual growth rated with pw = inital value x pweight:
>> cap drop rate
>> gen rate = (length_2 - length_1) / length_1
>> table foreign [pweight = length_1 * pw], c(mean rate)
>> ***************************CODE ENDS***************************
>>
>> Jochen
>>
>>> -----Ursprüngliche Nachricht-----
>>> Von: owner-statalist@hsphsun2.harvard.edu [mailto:owner-
>>> statalist@hsphsun2.harvard.edu] Im Auftrag von Steve Samuels
>>> Gesendet: Mittwoch, 4. August 2010 23:14
>>> An: statalist@hsphsun2.harvard.edu
>>> Betreff: Re: st: pweight + aweight, double weights
>>>
>>> I can see that the program is a little cryptic.  To clarify:
>>>
>>> I applied- svy: ratio- to  R =  length_2/length_1  and got asymmetric
>>> confidence intervals for R by computing them on the log scale and
>>> transforming back.
>>>
>>> The rate that Jochen asked for is rate =  (length_2 -
>>> length_1)/length_1 = R - 1, and that is what the -antilog-- program
>>> reports.  "relc" meant "relative change", which seemed clear to me, at
>>> the time.
>>>
>>> Steve
>>>
>>> On Wed, Aug 4, 2010 at 1:37 PM, Steve Samuels <sjsamuels@gmail.com> wrote:
>>> > Jochen--
>>> > If you do a probability weighted mean of the  individual growth rates
>>> > for a time period (single year, first year to last year) and weight by
>>> > w =  (initial value) x (probability weight), you would wind up with
>>> > the rate based on the probability-weighted aggregated sums. So Stas's
>>> > solution is exactly the solution you seek. Moreover,  Stas's version
>>> > will provide the correct standard error, one appropriate for a ratio
>>> > estimate.
>>> >
>>> > You could also calculate the ratio estimate directly and get
>>> > asymmetric CI's, which are likely to be more accurate than the
>>> > symmetric intervals
>>> >
>>> > **************************CODE BEGINS**************************
>>> > capture program drop _all
>>> > program antilog
>>> > local lparm  el(r(b),1,1)
>>> > local se    sqrt(el(r(V),1,1))
>>> > local bound  invttail(e(df_r),.025)*`se'
>>> > local parm  exp(`lparm')
>>> >
>>> > local ll  exp(`lparm'  - `bound')
>>> > local ul  exp( `lparm' + `bound')
>>> > di  "relc = "  100*( `parm'-1)  "    ll = "  100*(`ll'-1)  "   ul = "
>>> > 100*(`ul'-1)
>>> > end
>>> >
>>> > sysuse auto, clear
>>> > gen length_2 = displacement
>>> > rename length length_1
>>> > svyset _n
>>> > svy: ratio length_2/length_1
>>> > nlcom log(_b[_ratio_1])
>>> > antilog
>>> >
>>> > ***************************CODE ENDS***************************
>>> >
>>> >
>>> > Steve
>>> > '
>>> > Steven Samuels
>>> > sjsamuels@gmail.com
>>> > 18 Cantine's Island
>>> > Saugerties NY 12477
>>> > USA
>>> > Voice: 845-246-0774
>>> > Fax:    206-202-4783
>>> >
>>> >
>>> >
>>> > On Wed, Aug 4, 2010 at 11:43 AM, Stas Kolenikov <skolenik@gmail.com>
>>> wrote:
>>> >> Who knows. You might be able to get identical answers, but you'll
>>> >> spend more time trying to figure out the appropriate composition of
>>> >> weights trying to reproduce the answer from those -total- commands.
>>> >>
>>> >> On Wed, Aug 4, 2010 at 2:58 AM, Jochen Späth <jochen.spaeth@iaw.edu>
>>> wrote:
>>> >>> Hello Stas,
>>> >>>
>>> >>> thank you very much for your advice. I'm aware of the possibility of
>>> calculating the aggregate sums of investment for different subpopoluations
>>> using the pweight and calculating the aggregate (=aweighted) growth rates
>>> from the newly-generated data. I was just wondering whether there were a
>>> more "flexible" approach, such as, say multiplicating the two weight
>>> variables and use the result in a single -tabstat- or something like that.
>>> >>
>>> >> -
>>> >
>>> > On Tue, Aug 3, 2010 at 12:30 PM, Stas Kolenikov <skolenik@gmail.com>
>>> wrote:
>>> >> You would probably want to
>>> >>
>>> >> svyset PSU [pw=your weight], strata(strata)
>>> >> svy : total investment, over( year sector )
>>> >> nlcom ([investment]_subpop_2 -
>>> [investment]_subpop_1)/[investment]_subpop_1
>>> >>
>>> >> or whatever labels the -total- command is going to give to individual
>>> >> coefficients.
>>> >>
>>> >> On Tue, Aug 3, 2010 at 8:29 AM, Jochen Späth <jochen.spaeth@iaw.edu>
>>> wrote:
>>> >>> Dear Statalisters,
>>> >>>
>>> >>> I have a question about weights, especially about "double weights".
>>> >>>
>>> >>> I have micro-data on firms containing information about their
>>> investment behaviour (amounts) for several years. I then went on to
>>> calculate the firms' individual (discrete) growth rates of investment,
>>> i.e.
>>> >>>
>>> >>> rate_t = (inv_t - inv_t-1) / inv_t-1
>>> >>>
>>> >>> and wish to use these individual growth rates to calculate average
>>> growth rates for, say, economic sectors. Thereby, I'd like to attach an
>>> aweight to the -tabstat-, -table- or other suitable command, such that
>>> firms with higher investments in t-1 contribute a higher share to the
>>> average growth rate. This is, of course, straightforward in Stata.
>>> >>>
>>> >>> However, since I have sampled data I need to attach to this operation
>>> also a pweight to get information for the population instead of the
>>> sample.
>>> >>>
>>> >>> Can I calculate the average growth rates from the individual ones or
>>> do I need to -collapse- or -table, replace- my data? It seems that -
>>> svyset- could be what I am looking for, but it seems rather complicated.
>>> Is there a way to avoid the -svyset- command and to go on with simple -
>>> tabstat- or alike instead?
>>> >>>
>>> >>> Best,
>>> >>> Jochen
>>> >>>
>>> >
>>>
>>>
>>>
>>> --
>>> Steven Samuels
>>> sjsamuels@gmail.com
>>> 18 Cantine's Island
>>> Saugerties NY 12477
>>> USA
>>> Voice: 845-246-0774
>>> Fax:    206-202-4783
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>
>
>
> --
> Steven Samuels
> sjsamuels@gmail.com
> 18 Cantine's Island
> Saugerties NY 12477
> USA
> Voice: 845-246-0774
> Fax:    206-202-4783
>



-- 
Steven Samuels
sjsamuels@gmail.com
18 Cantine's Island
Saugerties NY 12477
USA
Voice: 845-246-0774
Fax:    206-202-4783

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index