Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: pweight + aweight, double weights

 From Steve Samuels To statalist@hsphsun2.harvard.edu Subject Re: st: pweight + aweight, double weights Date Sat, 7 Aug 2010 23:11:11 -0400

```Jochen, I can't tell you the number of times I've done something similar.

To go back to your original question:  You can do many other things
with the individual rates: plot them; for example, or get their
probability-weighted means,  standard deviations,and percentiles.

Steve
On Fri, Aug 6, 2010 at 5:20 AM, Jochen Späth <jochen.spaeth@iaw.edu> wrote:
> Thanks Steve,
>
> you are perfectly right. I could swear the first -table- statement in my code had provided the results that I posted.
>
> Now, this is perfectly the solution I was looking for, many thanks!
> Jochen
>
>
>> -----Ursprüngliche Nachricht-----
>> Von: owner-statalist@hsphsun2.harvard.edu [mailto:owner-
>> statalist@hsphsun2.harvard.edu] Im Auftrag von Steve Samuels
>> Gesendet: Donnerstag, 5. August 2010 19:08
>> An: statalist@hsphsun2.harvard.edu
>> Betreff: Re: st: pweight + aweight, double weights
>>
>> The two methods give identical results because they are algebraically
>> equivalent:
>>
>> For the pweighted mean with pwt2 = length_1 x pwt.
>> and rate = (length_2 - length_1):
>>
>> pwt2--weighted mean of rate = (sum of pwt2 x rate)/(sum of pwt2)
>>
>> The numerator is:
>>    sum of pwt x length_1 x (length_2 - length_1)/length_1
>> = sum of pwt x (length_2 - length_1)
>> = sum of pwt x length_2   minus sum of pwt x length_1
>> = (pwt--weighted sum of length_2) minus  (pwt-weighted sum of length_1)
>>
>> The denominator is:
>>    sum of pwt x length_1
>> = pwt-weighted sum of length_1
>>
>>
>> Steve
>> On Thu, Aug 5, 2010 at 9:23 AM, Steve Samuels <sjsamuels@gmail.com> wrote:
>> > Jochen, the totals you used in the -display- lines are different from
>> > those produced by the first -table- statement.  When I use the
>> > latter, the results of the two methods are identical.
>> >
>> > Steve
>> >
>> > **************************CODE BEGINS**************************
>> > sysuse auto, clear
>> > gen double length_2 = displacement
>> > rename length length_1
>> > rename trunk pwt
>> > * Look up the pweighted sums of length_1 and length_2 for foreign and
>> > domestic cars:
>> > table foreign [pw= pwt], c(sum length_1 sum length_2)
>> >
>> > di  "Domestic: "  (190108 - 153917)/153917
>> > di " Foreign:  "  (28194  -  42450)/42450
>> >
>> > * Look up the growth rates based on the aggregate sums of lenght_1 and
>> length_2:
>> >
>> > * Do a pweighted mean of the individual growth rated with pw = inital
>> > value x pweight:
>> > gen double pwt2 = length_1*pwt
>> > cap drop rate
>> > gen double rate = (length_2 - length_1) / length_1
>> > table foreign [pweight = pwt2], c(mean rate)
>> > ***************************CODE ENDS***************************
>> >
>> >
>> > On Thu, Aug 5, 2010 at 4:40 AM, Jochen Späth <jochen.spaeth@iaw.edu>
>> wrote:
>> >> Hi Steve,
>> >>
>> >> thanks for your little program. What I do not understand is your
>> statement that with a "probability weighted mean of the  individual growth
>> rates" I "would wind up with the rate based on the probability-weighted
>> aggregated sums". Check out this:
>> >>
>> >> **************************CODE BEGINS**************************
>> >> sysuse auto, clear
>> >> gen length_2 = displacement
>> >> rename length length_1
>> >> rename trunk pw
>> >>
>> >> * Look up the pweighted sums of length_1 and length_2 for foreign and
>> domestic cars:
>> >>
>> >> table foreign [pw= pw], c(sum length_1 sum length_2)
>> >>
>> >> * Look up the growth rates based on the aggregate sums of lenght_1 and
>> length_2:
>> >>
>> >> di "domestic:" (311319 - 270137 ) / 270137
>> >> di "foreign:"  (155268 - 235051) / 235051
>> >>
>> >> * Do a pweighted mean of the individual growth rated with pw = inital
>> value x pweight:
>> >> cap drop rate
>> >> gen rate = (length_2 - length_1) / length_1
>> >> table foreign [pweight = length_1 * pw], c(mean rate)
>> >> ***************************CODE ENDS***************************
>> >>
>> >> Jochen
>> >>
>> >>> -----Ursprüngliche Nachricht-----
>> >>> Von: owner-statalist@hsphsun2.harvard.edu [mailto:owner-
>> >>> statalist@hsphsun2.harvard.edu] Im Auftrag von Steve Samuels
>> >>> Gesendet: Mittwoch, 4. August 2010 23:14
>> >>> An: statalist@hsphsun2.harvard.edu
>> >>> Betreff: Re: st: pweight + aweight, double weights
>> >>>
>> >>> I can see that the program is a little cryptic.  To clarify:
>> >>>
>> >>> I applied- svy: ratio- to  R =  length_2/length_1  and got asymmetric
>> >>> confidence intervals for R by computing them on the log scale and
>> >>> transforming back.
>> >>>
>> >>> The rate that Jochen asked for is rate =  (length_2 -
>> >>> length_1)/length_1 = R - 1, and that is what the -antilog-- program
>> >>> reports.  "relc" meant "relative change", which seemed clear to me, at
>> >>> the time.
>> >>>
>> >>> Steve
>> >>>
>> >>> On Wed, Aug 4, 2010 at 1:37 PM, Steve Samuels <sjsamuels@gmail.com>
>> wrote:
>> >>> > Jochen--
>> >>> > If you do a probability weighted mean of the  individual growth
>> rates
>> >>> > for a time period (single year, first year to last year) and weight
>> by
>> >>> > w =  (initial value) x (probability weight), you would wind up with
>> >>> > the rate based on the probability-weighted aggregated sums. So
>> Stas's
>> >>> > solution is exactly the solution you seek. Moreover,  Stas's version
>> >>> > will provide the correct standard error, one appropriate for a ratio
>> >>> > estimate.
>> >>> >
>> >>> > You could also calculate the ratio estimate directly and get
>> >>> > asymmetric CI's, which are likely to be more accurate than the
>> >>> > symmetric intervals
>> >>> >
>> >>> > **************************CODE BEGINS**************************
>> >>> > capture program drop _all
>> >>> > program antilog
>> >>> > local lparm  el(r(b),1,1)
>> >>> > local se    sqrt(el(r(V),1,1))
>> >>> > local bound  invttail(e(df_r),.025)*`se'
>> >>> > local parm  exp(`lparm')
>> >>> >
>> >>> > local ll  exp(`lparm'  - `bound')
>> >>> > local ul  exp( `lparm' + `bound')
>> >>> > di  "relc = "  100*( `parm'-1)  "    ll = "  100*(`ll'-1)  "   ul =
>> "
>> >>> > 100*(`ul'-1)
>> >>> > end
>> >>> >
>> >>> > sysuse auto, clear
>> >>> > gen length_2 = displacement
>> >>> > rename length length_1
>> >>> > svyset _n
>> >>> > svy: ratio length_2/length_1
>> >>> > nlcom log(_b[_ratio_1])
>> >>> > antilog
>> >>> >
>> >>> > ***************************CODE ENDS***************************
>> >>> >
>> >>> >
>> >>> > Steve
>> >>> > '
>> >>> > Steven Samuels
>> >>> > sjsamuels@gmail.com
>> >>> > 18 Cantine's Island
>> >>> > Saugerties NY 12477
>> >>> > USA
>> >>> > Voice: 845-246-0774
>> >>> > Fax:    206-202-4783
>> >>> >
>> >>> >
>> >>> >
>> >>> > On Wed, Aug 4, 2010 at 11:43 AM, Stas Kolenikov <skolenik@gmail.com>
>> >>> wrote:
>> >>> >> Who knows. You might be able to get identical answers, but you'll
>> >>> >> spend more time trying to figure out the appropriate composition of
>> >>> >> weights trying to reproduce the answer from those -total- commands.
>> >>> >>
>> >>> >> On Wed, Aug 4, 2010 at 2:58 AM, Jochen Späth
>> <jochen.spaeth@iaw.edu>
>> >>> wrote:
>> >>> >>> Hello Stas,
>> >>> >>>
>> >>> >>> thank you very much for your advice. I'm aware of the possibility
>> of
>> >>> calculating the aggregate sums of investment for different
>> subpopoluations
>> >>> using the pweight and calculating the aggregate (=aweighted) growth
>> rates
>> >>> from the newly-generated data. I was just wondering whether there were
>> a
>> >>> more "flexible" approach, such as, say multiplicating the two weight
>> >>> variables and use the result in a single -tabstat- or something like
>> that.
>> >>> >>
>> >>> >> -
>> >>> >
>> >>> > On Tue, Aug 3, 2010 at 12:30 PM, Stas Kolenikov <skolenik@gmail.com>
>> >>> wrote:
>> >>> >> You would probably want to
>> >>> >>
>> >>> >> svyset PSU [pw=your weight], strata(strata)
>> >>> >> svy : total investment, over( year sector )
>> >>> >> nlcom ([investment]_subpop_2 -
>> >>> [investment]_subpop_1)/[investment]_subpop_1
>> >>> >>
>> >>> >> or whatever labels the -total- command is going to give to
>> individual
>> >>> >> coefficients.
>> >>> >>
>> >>> >> On Tue, Aug 3, 2010 at 8:29 AM, Jochen Späth
>> <jochen.spaeth@iaw.edu>
>> >>> wrote:
>> >>> >>> Dear Statalisters,
>> >>> >>>
>> >>> >>> I have a question about weights, especially about "double
>> weights".
>> >>> >>>
>> >>> >>> I have micro-data on firms containing information about their
>> >>> investment behaviour (amounts) for several years. I then went on to
>> >>> calculate the firms' individual (discrete) growth rates of investment,
>> >>> i.e.
>> >>> >>>
>> >>> >>> rate_t = (inv_t - inv_t-1) / inv_t-1
>> >>> >>>
>> >>> >>> and wish to use these individual growth rates to calculate average
>> >>> growth rates for, say, economic sectors. Thereby, I'd like to attach
>> an
>> >>> aweight to the -tabstat-, -table- or other suitable command, such that
>> >>> firms with higher investments in t-1 contribute a higher share to the
>> >>> average growth rate. This is, of course, straightforward in Stata.
>> >>> >>>
>> >>> >>> However, since I have sampled data I need to attach to this
>> operation
>> >>> also a pweight to get information for the population instead of the
>> >>> sample.
>> >>> >>>
>> >>> >>> Can I calculate the average growth rates from the individual ones
>> or
>> >>> do I need to -collapse- or -table, replace- my data? It seems that -
>> >>> svyset- could be what I am looking for, but it seems rather
>> complicated.
>> >>> Is there a way to avoid the -svyset- command and to go on with simple
>> -
>> >>> tabstat- or alike instead?
>> >>> >>>
>> >>> >>> Best,
>> >>> >>> Jochen
>> >>> >>>
>> >>> >
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Steven Samuels
>> >>> sjsamuels@gmail.com
>> >>> 18 Cantine's Island
>> >>> Saugerties NY 12477
>> >>> USA
>> >>> Voice: 845-246-0774
>> >>> Fax:    206-202-4783
>> >>>
>> >>> *
>> >>> *   For searches and help try:
>> >>> *   http://www.stata.com/help.cgi?search
>> >>> *   http://www.stata.com/support/statalist/faq
>> >>> *   http://www.ats.ucla.edu/stat/stata/
>> >>
>> >> *
>> >> *   For searches and help try:
>> >> *   http://www.stata.com/help.cgi?search
>> >> *   http://www.stata.com/support/statalist/faq
>> >> *   http://www.ats.ucla.edu/stat/stata/
>> >>
>> >
>> >
>> >
>> > --
>> > Steven Samuels
>> > sjsamuels@gmail.com
>> > 18 Cantine's Island
>> > Saugerties NY 12477
>> > USA
>> > Voice: 845-246-0774
>> > Fax:    206-202-4783
>> >
>>
>>
>>
>> --
>> Steven Samuels
>> sjsamuels@gmail.com
>> 18 Cantine's Island
>> Saugerties NY 12477
>> USA
>> Voice: 845-246-0774
>> Fax:    206-202-4783
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

--
Steven Samuels
sjsamuels@gmail.com
18 Cantine's Island
Saugerties NY 12477
USA
Voice: 845-246-0774
Fax:    206-202-4783

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```