# RE: st: RE: Re: Efficiently looping through countries and years counting and computing the percentage of people whom selected a specific answer

Assume w = weights
gen weighted = w * Female
bysort country_id Year : egen denominator = total(w)
bysort country_id Year : egen numerator = total(weighted)
gen per_female = numerator / denominator

Tim

Dear all,

I have another small question, suppose I would like to do the
following computation, instead of calculating the mean, I want to
calculate a weighted mean of "females", I know this doesn't actually
makes any sense, however it is a good example to show what I want I
think.

In mathematical terms this gives us: (w1 * obs1 + w2 * obs2 + ...) /
(sum of all weights), instead of mean(Female).

Yours sincerely,

Wen Jun Jie

2013/6/6 J. J. W. <[email protected]>:
> Dear Tim,
>
> I want to thank you for your help. This feature is indeed amazing,
> bysort. I have never heard of it and just started using STATA, but
> this is exactly what I wanted.
>
> Yours sincerely,
>
> Wen Jun Jie
>
> 2013/6/6 tshmak <[email protected]>:
>> Perhaps something like:
>>
>> bysort country_id Year : egen per_female = mean(Female)
>>
>> ???
>>
>> This would work if Female was either 0,1, or missing.
>>
>> Tim
>>
>>
>> Dear all,
>>
>> I have a small problem, which I have solved, but I was wondering whether:
>>
>> - What the usual way is to do this?
>> - Can this be implemented more efficiently?
>>
>> Suppose I have
>>
>> Country Year Female
>>
>> Netherlands 1990 1
>> Netherlands 1990 0
>> Netherlands 1990 1
>> Netherlands 1991 1
>> Netherlands 1991 1
>> Netherlands 1991 1
>> Netherlands 1992 1
>> Netherlands 1992 0
>> ...
>>
>> Well now I would like to calculate the amount of females as the
>> percentage of total. Now do this for every country for every year.
>> Well I've devised a script for it, presented below:
>>
>> gen per_female= 0
>>
>> /* Getting the maximum and minimum indices for countries */
>> su country_id, meanonly
>>
>> /* For all different countries */
>> forvalues i = `r(min)'/`r(max)'{
>>
>> su year if country_id == `i', meanonly
>> /* For all different years */
>> forvalues j = `r(min)'/`r(max)'{
>> count if country_id == `i' & female== 1 & year == `j'
>> local nr_females= r(N)
>> count if country_id == `i' & year == `j'& (female== 1 | female== 0)
>>         local nr_obser = r(N)
>> replace trust2 = `nr_females'/`nr_obser' if country_id == `i' & year == `j'
>> }
>> }
>>
>> It basically works, however there are some problems.
>>
>> a) I do not believe this is an efficient computation since there are a
>> LOT of cases there are no replacements at all. How can I make this
>> more efficient?
>>
>> b) Is my way, "the way to go"? I believe this is more like programming
>> and I am wondering how this can be done more easily in STATA (even
>> though my method is relatively easy and straight forward).
>>
>> c) At the moment you see that I did this: "(female== 1 | female== 0)",
>> basically this ensures that I only count the observations that I have
>> and eliminates the ones that I have missing values for (females == .).
>> Is this correct? Should I handle missing data in this way?
>>
>>
>>
>> Wen Jun Jie
```

• References: