Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Aw: Re: Re: st: sum over variables for determinate observations


From   "Marie-Luise Schmitz" <[email protected]>
To   [email protected]
Subject   Aw: Re: Re: st: sum over variables for determinate observations
Date   Mon, 27 Jan 2014 14:04:10 +0100 (CET)

Yes Nick, you are right, I badly explained myself, sorry.
 
I used:
-----------------------

sort province_name ateco_section
collapse(sum) numero_contribuenti_2005-ricercatori_med_2006, by(province_name province_code_107 license_number ateco_section ateco_section_description)

-----------------------
where 'numero_contribuenti_2005' is the first and 'ricercatori_med_2006' the last numerical variable in the data set.
 
The only remaining problem is that missings I defined as .a appear as zeroes in the collapsed data although it would be desirable to keep them defined as missings.
 
 
 

Gesendet: Montag, 27. Januar 2014 um 13:19 Uhr
Von: "Nick Cox" <[email protected]>
An: "[email protected]" <[email protected]>
Betreff: Re: Re: st: sum over variables for determinate observations
What you mean by "did not work" is not explained here, but once you
-keep- just one observation for each group, scope for accurate
calculations of totals of any other variable is lost.

-collapse- is, it seems, what you need here, obviating the need for a
loop. It was suggested earlier in this thread, and it's not clear why
you are not using it.

Nick
[email protected]

On 27 January 2014 12:12, Marie-Luise Schmitz <[email protected]> wrote:
> Dear Roberto,
>
> thank you for your suggestion. I used:
>
> bysort province_name ateco_section: egen numero_contribuenti_2005_test = total(numero_contribuenti_2005)
> by province_name ateco_section: keep if _n == 1
> replace numero_contribuenti_2005_test=.a if numero_contribuenti_2005==.a
>
> to do the task for one variable and it perfectly worked out. But the data set contains 93 numeric variables. I tried to do a foreach loop but this did not work. Any suggestion how to do this for many variables?
>
>
>
> Gesendet: Sonntag, 26. Januar 2014 um 19:01 Uhr
> Von: "Roberto Ferrer" <[email protected]>
> An: "Stata Help" <[email protected]>
> Betreff: Re: st: sum over variables for determinate observations
> Alternatives are:
>
> /*
> Use -egen, total()-, to compute totals and keep an arbitrary observation
> (here the first one).
> */
>
> bysort provname atecosec: egen snumcontrib = total(numcontrib)
> by provname atecosec: keep if _n == 1
>
>
> /*
> Use -sum- to compute a cumulative sum and keep the last observation
> */
>
> bysort provname atecosec: gen snumcontrib = sum(numcontrib)
> by provname atecosec: keep if _n == _N
>
> The Stata Journal (2002)
> 2, Number 1, pp. 86–102
> Speaking Stata: How to move step by: step
> Nicholas J. Cox
>
> is a helpful reference.
>
> On Sun, Jan 26, 2014 at 1:13 PM, Roberto Ferrer <[email protected]> wrote:
>> You're right, -collapse- works:
>>
>> *----------- begin code --------------
>>
>> clear all
>> set more off
>>
>> input ///
>> str20 provname provcode str2 lic str1 atecosec str1
>> atecosec2002 numcontrib
>> AGRIGENTO 84 AG A
>> A 100
>> AGRIGENTO 84 AG A
>> B 50
>> AGRIGENTO 84 AG B
>> C 12
>> AGRIGENTO 84 AG C
>> D 79
>> AGRIGENTO 84 AG O
>> P 34
>> AGRIGENTO 84 AG P
>> Q 0
>> AGRIGENTO 84 AG Z
>> Z 1
>> ALESSANDRIA 6 AL A
>> A 29
>> ALESSANDRIA 6 AL A
>> B 12
>> ALESSANDRIA 6 AL B
>> C 0
>> ALESSANDRIA 6 AL C
>> D 5
>> end
>>
>> list, sepby(provname)
>>
>> collapse (sum) numcontrib, by(provname atecosec)
>>
>> list, sepby(provname)
>>
>> *------------------- end code ------------------------
>>
>> On Sun, Jan 26, 2014 at 11:06 AM, Marie-Luise Schmitz
>> <[email protected]> wrote:
>>> Dear Stata Users,
>>>
>>> I have a data set that looks like this:
>>>
>>> province_name province_code_107 license_number ateco_section ateco_section2002 numero_contribuenti...
>>> AGRIGENTO 84 AG A A 100
>>> AGRIGENTO 84 AG A B 50
>>> AGRIGENTO 84 AG B C 12
>>> AGRIGENTO 84 AG C D 79
>>> AGRIGENTO 84 AG O P 34
>>> AGRIGENTO 84 AG P Q 0
>>> AGRIGENTO 84 AG Z Z 1
>>> ALESSANDRIA 6 AL A A 29
>>> ALESSANDRIA 6 AL A B 12
>>> ALESSANDRIA 6 AL B C 0
>>> ALESSANDRIA 6 AL C D 5
>>>
>>> It contains numerous numeric variables following the variable numero_contribuenti.
>>> The variable ateco_section is a redefined version of the variable ateco_section2002 and shows sectors of economic activity. For instance, A = agriculture, B = fishery, etc.
>>> In the redefined variable ateco_section, sectors A and B are summarzied by A.
>>> However, the problem is that I want only one entry for sector A for each province that is, for numeric variables as numero_contribuenti I want the sum of previous A and B, hence:
>>>
>>> province_name province_code_107 license_number ateco_section numero_contribuenti .........
>>> AGRIGENTO 84 AG A 150
>>> AGRIGENTO 84 AG B 12
>>>
>>>
>>> I want to apply that to each province.
>>> I guess this problem may be solved with collapse (sum) but I am totally lost.
>>> Any help is highly appreciated.

*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/[http://www.stata.com/support/faqs/resources/statalist-faq/]
* http://www.ats.ucla.edu/stat/stata/[http://www.ats.ucla.edu/stat/stata/]

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index