Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Marie-Luise Schmitz" <querida-ise@gmx.de> |
To | statalist@hsphsun2.harvard.edu |
Subject | Aw: Re: Re: st: sum over variables for determinate observations |
Date | Mon, 27 Jan 2014 14:04:10 +0100 (CET) |
Yes Nick, you are right, I badly explained myself, sorry. I used: ----------------------- sort province_name ateco_section collapse(sum) numero_contribuenti_2005-ricercatori_med_2006, by(province_name province_code_107 license_number ateco_section ateco_section_description) ----------------------- where 'numero_contribuenti_2005' is the first and 'ricercatori_med_2006' the last numerical variable in the data set. The only remaining problem is that missings I defined as .a appear as zeroes in the collapsed data although it would be desirable to keep them defined as missings. Gesendet: Montag, 27. Januar 2014 um 13:19 Uhr Von: "Nick Cox" <njcoxstata@gmail.com> An: "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> Betreff: Re: Re: st: sum over variables for determinate observations What you mean by "did not work" is not explained here, but once you -keep- just one observation for each group, scope for accurate calculations of totals of any other variable is lost. -collapse- is, it seems, what you need here, obviating the need for a loop. It was suggested earlier in this thread, and it's not clear why you are not using it. Nick njcoxstata@gmail.com On 27 January 2014 12:12, Marie-Luise Schmitz <querida-ise@gmx.de> wrote: > Dear Roberto, > > thank you for your suggestion. I used: > > bysort province_name ateco_section: egen numero_contribuenti_2005_test = total(numero_contribuenti_2005) > by province_name ateco_section: keep if _n == 1 > replace numero_contribuenti_2005_test=.a if numero_contribuenti_2005==.a > > to do the task for one variable and it perfectly worked out. But the data set contains 93 numeric variables. I tried to do a foreach loop but this did not work. Any suggestion how to do this for many variables? > > > > Gesendet: Sonntag, 26. Januar 2014 um 19:01 Uhr > Von: "Roberto Ferrer" <refp16@gmail.com> > An: "Stata Help" <statalist@hsphsun2.harvard.edu> > Betreff: Re: st: sum over variables for determinate observations > Alternatives are: > > /* > Use -egen, total()-, to compute totals and keep an arbitrary observation > (here the first one). > */ > > bysort provname atecosec: egen snumcontrib = total(numcontrib) > by provname atecosec: keep if _n == 1 > > > /* > Use -sum- to compute a cumulative sum and keep the last observation > */ > > bysort provname atecosec: gen snumcontrib = sum(numcontrib) > by provname atecosec: keep if _n == _N > > The Stata Journal (2002) > 2, Number 1, pp. 86–102 > Speaking Stata: How to move step by: step > Nicholas J. Cox > > is a helpful reference. > > On Sun, Jan 26, 2014 at 1:13 PM, Roberto Ferrer <refp16@gmail.com> wrote: >> You're right, -collapse- works: >> >> *----------- begin code -------------- >> >> clear all >> set more off >> >> input /// >> str20 provname provcode str2 lic str1 atecosec str1 >> atecosec2002 numcontrib >> AGRIGENTO 84 AG A >> A 100 >> AGRIGENTO 84 AG A >> B 50 >> AGRIGENTO 84 AG B >> C 12 >> AGRIGENTO 84 AG C >> D 79 >> AGRIGENTO 84 AG O >> P 34 >> AGRIGENTO 84 AG P >> Q 0 >> AGRIGENTO 84 AG Z >> Z 1 >> ALESSANDRIA 6 AL A >> A 29 >> ALESSANDRIA 6 AL A >> B 12 >> ALESSANDRIA 6 AL B >> C 0 >> ALESSANDRIA 6 AL C >> D 5 >> end >> >> list, sepby(provname) >> >> collapse (sum) numcontrib, by(provname atecosec) >> >> list, sepby(provname) >> >> *------------------- end code ------------------------ >> >> On Sun, Jan 26, 2014 at 11:06 AM, Marie-Luise Schmitz >> <querida-ise@gmx.de> wrote: >>> Dear Stata Users, >>> >>> I have a data set that looks like this: >>> >>> province_name province_code_107 license_number ateco_section ateco_section2002 numero_contribuenti... >>> AGRIGENTO 84 AG A A 100 >>> AGRIGENTO 84 AG A B 50 >>> AGRIGENTO 84 AG B C 12 >>> AGRIGENTO 84 AG C D 79 >>> AGRIGENTO 84 AG O P 34 >>> AGRIGENTO 84 AG P Q 0 >>> AGRIGENTO 84 AG Z Z 1 >>> ALESSANDRIA 6 AL A A 29 >>> ALESSANDRIA 6 AL A B 12 >>> ALESSANDRIA 6 AL B C 0 >>> ALESSANDRIA 6 AL C D 5 >>> >>> It contains numerous numeric variables following the variable numero_contribuenti. >>> The variable ateco_section is a redefined version of the variable ateco_section2002 and shows sectors of economic activity. For instance, A = agriculture, B = fishery, etc. >>> In the redefined variable ateco_section, sectors A and B are summarzied by A. >>> However, the problem is that I want only one entry for sector A for each province that is, for numeric variables as numero_contribuenti I want the sum of previous A and B, hence: >>> >>> province_name province_code_107 license_number ateco_section numero_contribuenti ......... >>> AGRIGENTO 84 AG A 150 >>> AGRIGENTO 84 AG B 12 >>> >>> >>> I want to apply that to each province. >>> I guess this problem may be solved with collapse (sum) but I am totally lost. >>> Any help is highly appreciated. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/[http://www.stata.com/support/faqs/resources/statalist-faq/] * http://www.ats.ucla.edu/stat/stata/[http://www.ats.ucla.edu/stat/stata/] * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/