Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: Re: Efficiently looping through countries and years counting and computing the percentage of people whom selected a specific answer


From   "J. J. W." <[email protected]>
To   [email protected]
Subject   Re: st: RE: Re: Efficiently looping through countries and years counting and computing the percentage of people whom selected a specific answer
Date   Thu, 6 Jun 2013 05:29:54 +0200

Dear Tim,

I want to thank you for your help. This feature is indeed amazing,
bysort. I have never heard of it and just started using STATA, but
this is exactly what I wanted.

Yours sincerely,

Wen Jun Jie

2013/6/6 tshmak <[email protected]>:
> <>
> Perhaps something like:
>
> bysort country_id Year : egen per_female = mean(Female)
>
> ???
>
> This would work if Female was either 0,1, or missing.
>
> Tim
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of J. J. W.
> Sent: 06 June 2013 10:58
> To: [email protected]
> Subject: st: Re: Efficiently looping through countries and years counting and computing the percentage of people whom selected a specific answer
>
> Dear all,
>
> I have a small problem, which I have solved, but I was wondering whether:
>
> - What the usual way is to do this?
> - Can this be implemented more efficiently?
>
> Suppose I have
>
> Country Year Female
>
> Netherlands 1990 1
> Netherlands 1990 0
> Netherlands 1990 1
> Netherlands 1991 1
> Netherlands 1991 1
> Netherlands 1991 1
> Netherlands 1992 1
> Netherlands 1992 0
> ...
>
> Well now I would like to calculate the amount of females as the
> percentage of total. Now do this for every country for every year.
> Well I've devised a script for it, presented below:
>
> gen per_female= 0
>
> /* Getting the maximum and minimum indices for countries */
> su country_id, meanonly
>
> /* For all different countries */
> forvalues i = `r(min)'/`r(max)'{
>
> su year if country_id == `i', meanonly
> /* For all different years */
> forvalues j = `r(min)'/`r(max)'{
> count if country_id == `i' & female== 1 & year == `j'
> local nr_females= r(N)
> count if country_id == `i' & year == `j'& (female== 1 | female== 0)
>         local nr_obser = r(N)
> replace trust2 = `nr_females'/`nr_obser' if country_id == `i' & year == `j'
> }
> }
>
> It basically works, however there are some problems.
>
> a) I do not believe this is an efficient computation since there are a
> LOT of cases there are no replacements at all. How can I make this
> more efficient?
>
> b) Is my way, "the way to go"? I believe this is more like programming
> and I am wondering how this can be done more easily in STATA (even
> though my method is relatively easy and straight forward).
>
> c) At the moment you see that I did this: "(female== 1 | female== 0)",
> basically this ensures that I only count the observations that I have
> and eliminates the ones that I have missing values for (females == .).
> Is this correct? Should I handle missing data in this way?
>
> Any suggestions, advice or comments are very helpful and appreciated!
>
> Thank you for your answer!
>
> Wen Jun Jie
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index