Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# RE: st: RE: Re: Efficiently looping through countries and years counting and computing the percentage of people whom selected a specific answer

 From tshmak <[email protected]> To "[email protected]" <[email protected]> Subject RE: st: RE: Re: Efficiently looping through countries and years counting and computing the percentage of people whom selected a specific answer Date Thu, 6 Jun 2013 15:14:02 +0800

```<>
Assume w = weights
gen weighted = w * Female
bysort country_id Year : egen denominator = total(w)
bysort country_id Year : egen numerator = total(weighted)
gen per_female = numerator / denominator

Tim

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of J. J. W.
Sent: 06 June 2013 11:41
To: [email protected]
Subject: Re: st: RE: Re: Efficiently looping through countries and years counting and computing the percentage of people whom selected a specific answer

Dear all,

I have another small question, suppose I would like to do the
following computation, instead of calculating the mean, I want to
calculate a weighted mean of "females", I know this doesn't actually
makes any sense, however it is a good example to show what I want I
think.

In mathematical terms this gives us: (w1 * obs1 + w2 * obs2 + ...) /
(sum of all weights), instead of mean(Female).

Yours sincerely,

Wen Jun Jie

2013/6/6 J. J. W. <[email protected]>:
> Dear Tim,
>
> I want to thank you for your help. This feature is indeed amazing,
> bysort. I have never heard of it and just started using STATA, but
> this is exactly what I wanted.
>
> Yours sincerely,
>
> Wen Jun Jie
>
> 2013/6/6 tshmak <[email protected]>:
>> <>
>> Perhaps something like:
>>
>> bysort country_id Year : egen per_female = mean(Female)
>>
>> ???
>>
>> This would work if Female was either 0,1, or missing.
>>
>> Tim
>>
>> -----Original Message-----
>> From: [email protected] [mailto:[email protected]] On Behalf Of J. J. W.
>> Sent: 06 June 2013 10:58
>> To: [email protected]
>> Subject: st: Re: Efficiently looping through countries and years counting and computing the percentage of people whom selected a specific answer
>>
>> Dear all,
>>
>> I have a small problem, which I have solved, but I was wondering whether:
>>
>> - What the usual way is to do this?
>> - Can this be implemented more efficiently?
>>
>> Suppose I have
>>
>> Country Year Female
>>
>> Netherlands 1990 1
>> Netherlands 1990 0
>> Netherlands 1990 1
>> Netherlands 1991 1
>> Netherlands 1991 1
>> Netherlands 1991 1
>> Netherlands 1992 1
>> Netherlands 1992 0
>> ...
>>
>> Well now I would like to calculate the amount of females as the
>> percentage of total. Now do this for every country for every year.
>> Well I've devised a script for it, presented below:
>>
>> gen per_female= 0
>>
>> /* Getting the maximum and minimum indices for countries */
>> su country_id, meanonly
>>
>> /* For all different countries */
>> forvalues i = `r(min)'/`r(max)'{
>>
>> su year if country_id == `i', meanonly
>> /* For all different years */
>> forvalues j = `r(min)'/`r(max)'{
>> count if country_id == `i' & female== 1 & year == `j'
>> local nr_females= r(N)
>> count if country_id == `i' & year == `j'& (female== 1 | female== 0)
>>         local nr_obser = r(N)
>> replace trust2 = `nr_females'/`nr_obser' if country_id == `i' & year == `j'
>> }
>> }
>>
>> It basically works, however there are some problems.
>>
>> a) I do not believe this is an efficient computation since there are a
>> LOT of cases there are no replacements at all. How can I make this
>> more efficient?
>>
>> b) Is my way, "the way to go"? I believe this is more like programming
>> and I am wondering how this can be done more easily in STATA (even
>> though my method is relatively easy and straight forward).
>>
>> c) At the moment you see that I did this: "(female== 1 | female== 0)",
>> basically this ensures that I only count the observations that I have
>> and eliminates the ones that I have missing values for (females == .).
>> Is this correct? Should I handle missing data in this way?
>>
>>
>>
>> Wen Jun Jie
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```

• References: