Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

st: Re: Efficiently looping through countries and years counting and computing the percentage of people whom selected a specific answer

 From "J. J. W." To statalist@hsphsun2.harvard.edu Subject st: Re: Efficiently looping through countries and years counting and computing the percentage of people whom selected a specific answer Date Thu, 6 Jun 2013 04:58:25 +0200

```Dear all,

I have a small problem, which I have solved, but I was wondering whether:

- What the usual way is to do this?
- Can this be implemented more efficiently?

Suppose I have

Country Year Female

Netherlands 1990 1
Netherlands 1990 0
Netherlands 1990 1
Netherlands 1991 1
Netherlands 1991 1
Netherlands 1991 1
Netherlands 1992 1
Netherlands 1992 0
...

Well now I would like to calculate the amount of females as the
percentage of total. Now do this for every country for every year.
Well I've devised a script for it, presented below:

gen per_female= 0

/* Getting the maximum and minimum indices for countries */
su country_id, meanonly

/* For all different countries */
forvalues i = `r(min)'/`r(max)'{

su year if country_id == `i', meanonly
/* For all different years */
forvalues j = `r(min)'/`r(max)'{
count if country_id == `i' & female== 1 & year == `j'
local nr_females= r(N)
count if country_id == `i' & year == `j'& (female== 1 | female== 0)
local nr_obser = r(N)
replace trust2 = `nr_females'/`nr_obser' if country_id == `i' & year == `j'
}
}

It basically works, however there are some problems.

a) I do not believe this is an efficient computation since there are a
LOT of cases there are no replacements at all. How can I make this
more efficient?

b) Is my way, "the way to go"? I believe this is more like programming
and I am wondering how this can be done more easily in STATA (even
though my method is relatively easy and straight forward).

c) At the moment you see that I did this: "(female== 1 | female== 0)",
basically this ensures that I only count the observations that I have
and eliminates the ones that I have missing values for (females == .).
Is this correct? Should I handle missing data in this way?