Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Re: Efficiently looping through countries and years counting and computing the percentage of people whom selected a specific answer


From   "J. J. W." <bsc.j.j.w@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   st: Re: Efficiently looping through countries and years counting and computing the percentage of people whom selected a specific answer
Date   Thu, 6 Jun 2013 04:58:25 +0200

Dear all,

I have a small problem, which I have solved, but I was wondering whether:

- What the usual way is to do this?
- Can this be implemented more efficiently?

Suppose I have

Country Year Female

Netherlands 1990 1
Netherlands 1990 0
Netherlands 1990 1
Netherlands 1991 1
Netherlands 1991 1
Netherlands 1991 1
Netherlands 1992 1
Netherlands 1992 0
...

Well now I would like to calculate the amount of females as the
percentage of total. Now do this for every country for every year.
Well I've devised a script for it, presented below:

gen per_female= 0

/* Getting the maximum and minimum indices for countries */
su country_id, meanonly

/* For all different countries */
forvalues i = `r(min)'/`r(max)'{

su year if country_id == `i', meanonly
/* For all different years */
forvalues j = `r(min)'/`r(max)'{
count if country_id == `i' & female== 1 & year == `j'
local nr_females= r(N)
count if country_id == `i' & year == `j'& (female== 1 | female== 0)
        local nr_obser = r(N)
replace trust2 = `nr_females'/`nr_obser' if country_id == `i' & year == `j'
}
}

It basically works, however there are some problems.

a) I do not believe this is an efficient computation since there are a
LOT of cases there are no replacements at all. How can I make this
more efficient?

b) Is my way, "the way to go"? I believe this is more like programming
and I am wondering how this can be done more easily in STATA (even
though my method is relatively easy and straight forward).

c) At the moment you see that I did this: "(female== 1 | female== 0)",
basically this ensures that I only count the observations that I have
and eliminates the ones that I have missing values for (females == .).
Is this correct? Should I handle missing data in this way?

Any suggestions, advice or comments are very helpful and appreciated!

Thank you for your answer!

Wen Jun Jie
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index