Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: Re: Efficiently looping through countries and years counting and computing the percentage of people whom selected a specific answer


From   tshmak <tshmak@hku.hk>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   st: RE: Re: Efficiently looping through countries and years counting and computing the percentage of people whom selected a specific answer
Date   Thu, 6 Jun 2013 11:12:44 +0800

<>
Perhaps something like: 

bysort country_id Year : egen per_female = mean(Female)

??? 

This would work if Female was either 0,1, or missing. 

Tim

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of J. J. W.
Sent: 06 June 2013 10:58
To: statalist@hsphsun2.harvard.edu
Subject: st: Re: Efficiently looping through countries and years counting and computing the percentage of people whom selected a specific answer

Dear all,

I have a small problem, which I have solved, but I was wondering whether:

- What the usual way is to do this?
- Can this be implemented more efficiently?

Suppose I have

Country Year Female

Netherlands 1990 1
Netherlands 1990 0
Netherlands 1990 1
Netherlands 1991 1
Netherlands 1991 1
Netherlands 1991 1
Netherlands 1992 1
Netherlands 1992 0
...

Well now I would like to calculate the amount of females as the
percentage of total. Now do this for every country for every year.
Well I've devised a script for it, presented below:

gen per_female= 0

/* Getting the maximum and minimum indices for countries */
su country_id, meanonly

/* For all different countries */
forvalues i = `r(min)'/`r(max)'{

su year if country_id == `i', meanonly
/* For all different years */
forvalues j = `r(min)'/`r(max)'{
count if country_id == `i' & female== 1 & year == `j'
local nr_females= r(N)
count if country_id == `i' & year == `j'& (female== 1 | female== 0)
        local nr_obser = r(N)
replace trust2 = `nr_females'/`nr_obser' if country_id == `i' & year == `j'
}
}

It basically works, however there are some problems.

a) I do not believe this is an efficient computation since there are a
LOT of cases there are no replacements at all. How can I make this
more efficient?

b) Is my way, "the way to go"? I believe this is more like programming
and I am wondering how this can be done more easily in STATA (even
though my method is relatively easy and straight forward).

c) At the moment you see that I did this: "(female== 1 | female== 0)",
basically this ensures that I only count the observations that I have
and eliminates the ones that I have missing values for (females == .).
Is this correct? Should I handle missing data in this way?

Any suggestions, advice or comments are very helpful and appreciated!

Thank you for your answer!

Wen Jun Jie
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index