Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Marie-Luise Schmitz" <[email protected]> |

To |
[email protected] |

Subject |
Aw: Re: Re: st: sum over variables for determinate observations |

Date |
Mon, 27 Jan 2014 14:04:10 +0100 (CET) |

Yes Nick, you are right, I badly explained myself, sorry. I used: ----------------------- sort province_name ateco_section collapse(sum) numero_contribuenti_2005-ricercatori_med_2006, by(province_name province_code_107 license_number ateco_section ateco_section_description) ----------------------- where 'numero_contribuenti_2005' is the first and 'ricercatori_med_2006' the last numerical variable in the data set. The only remaining problem is that missings I defined as .a appear as zeroes in the collapsed data although it would be desirable to keep them defined as missings. Gesendet: Montag, 27. Januar 2014 um 13:19 Uhr Von: "Nick Cox" <[email protected]> An: "[email protected]" <[email protected]> Betreff: Re: Re: st: sum over variables for determinate observations What you mean by "did not work" is not explained here, but once you -keep- just one observation for each group, scope for accurate calculations of totals of any other variable is lost. -collapse- is, it seems, what you need here, obviating the need for a loop. It was suggested earlier in this thread, and it's not clear why you are not using it. Nick [email protected] On 27 January 2014 12:12, Marie-Luise Schmitz <[email protected]> wrote: > Dear Roberto, > > thank you for your suggestion. I used: > > bysort province_name ateco_section: egen numero_contribuenti_2005_test = total(numero_contribuenti_2005) > by province_name ateco_section: keep if _n == 1 > replace numero_contribuenti_2005_test=.a if numero_contribuenti_2005==.a > > to do the task for one variable and it perfectly worked out. But the data set contains 93 numeric variables. I tried to do a foreach loop but this did not work. Any suggestion how to do this for many variables? > > > > Gesendet: Sonntag, 26. Januar 2014 um 19:01 Uhr > Von: "Roberto Ferrer" <[email protected]> > An: "Stata Help" <[email protected]> > Betreff: Re: st: sum over variables for determinate observations > Alternatives are: > > /* > Use -egen, total()-, to compute totals and keep an arbitrary observation > (here the first one). > */ > > bysort provname atecosec: egen snumcontrib = total(numcontrib) > by provname atecosec: keep if _n == 1 > > > /* > Use -sum- to compute a cumulative sum and keep the last observation > */ > > bysort provname atecosec: gen snumcontrib = sum(numcontrib) > by provname atecosec: keep if _n == _N > > The Stata Journal (2002) > 2, Number 1, pp. 86–102 > Speaking Stata: How to move step by: step > Nicholas J. Cox > > is a helpful reference. > > On Sun, Jan 26, 2014 at 1:13 PM, Roberto Ferrer <[email protected]> wrote: >> You're right, -collapse- works: >> >> *----------- begin code -------------- >> >> clear all >> set more off >> >> input /// >> str20 provname provcode str2 lic str1 atecosec str1 >> atecosec2002 numcontrib >> AGRIGENTO 84 AG A >> A 100 >> AGRIGENTO 84 AG A >> B 50 >> AGRIGENTO 84 AG B >> C 12 >> AGRIGENTO 84 AG C >> D 79 >> AGRIGENTO 84 AG O >> P 34 >> AGRIGENTO 84 AG P >> Q 0 >> AGRIGENTO 84 AG Z >> Z 1 >> ALESSANDRIA 6 AL A >> A 29 >> ALESSANDRIA 6 AL A >> B 12 >> ALESSANDRIA 6 AL B >> C 0 >> ALESSANDRIA 6 AL C >> D 5 >> end >> >> list, sepby(provname) >> >> collapse (sum) numcontrib, by(provname atecosec) >> >> list, sepby(provname) >> >> *------------------- end code ------------------------ >> >> On Sun, Jan 26, 2014 at 11:06 AM, Marie-Luise Schmitz >> <[email protected]> wrote: >>> Dear Stata Users, >>> >>> I have a data set that looks like this: >>> >>> province_name province_code_107 license_number ateco_section ateco_section2002 numero_contribuenti... >>> AGRIGENTO 84 AG A A 100 >>> AGRIGENTO 84 AG A B 50 >>> AGRIGENTO 84 AG B C 12 >>> AGRIGENTO 84 AG C D 79 >>> AGRIGENTO 84 AG O P 34 >>> AGRIGENTO 84 AG P Q 0 >>> AGRIGENTO 84 AG Z Z 1 >>> ALESSANDRIA 6 AL A A 29 >>> ALESSANDRIA 6 AL A B 12 >>> ALESSANDRIA 6 AL B C 0 >>> ALESSANDRIA 6 AL C D 5 >>> >>> It contains numerous numeric variables following the variable numero_contribuenti. >>> The variable ateco_section is a redefined version of the variable ateco_section2002 and shows sectors of economic activity. For instance, A = agriculture, B = fishery, etc. >>> In the redefined variable ateco_section, sectors A and B are summarzied by A. >>> However, the problem is that I want only one entry for sector A for each province that is, for numeric variables as numero_contribuenti I want the sum of previous A and B, hence: >>> >>> province_name province_code_107 license_number ateco_section numero_contribuenti ......... >>> AGRIGENTO 84 AG A 150 >>> AGRIGENTO 84 AG B 12 >>> >>> >>> I want to apply that to each province. >>> I guess this problem may be solved with collapse (sum) but I am totally lost. >>> Any help is highly appreciated. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/[http://www.stata.com/support/faqs/resources/statalist-faq/] * http://www.ats.ucla.edu/stat/stata/[http://www.ats.ucla.edu/stat/stata/] * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: Re: Re: st: sum over variables for determinate observations***From:*Roberto Ferrer <[email protected]>

**References**:**st: sum over variables for determinate observations***From:*"Marie-Luise Schmitz" <[email protected]>

**Re: st: sum over variables for determinate observations***From:*Roberto Ferrer <[email protected]>

**Re: st: sum over variables for determinate observations***From:*Roberto Ferrer <[email protected]>

**Aw: Re: st: sum over variables for determinate observations***From:*"Marie-Luise Schmitz" <[email protected]>

**Re: Re: st: sum over variables for determinate observations***From:*Nick Cox <[email protected]>

- Prev by Date:
**Re: st: Random draw from log normal distribution with known mean and sd** - Next by Date:
**st: xtlogit with (clustered individuals and households)** - Previous by thread:
**Re: Re: st: sum over variables for determinate observations** - Next by thread:
**Re: Re: Re: st: sum over variables for determinate observations** - Index(es):