Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Re: Re: Re: st: sum over variables for determinate observations


From   Roberto Ferrer <[email protected]>
To   Stata Help <[email protected]>
Subject   Re: Re: Re: Re: st: sum over variables for determinate observations
Date   Wed, 29 Jan 2014 13:01:22 -0430

[Now I'm not so sure -mvencode-, -mvdecode- serves you.]

The approach you take only solves the specific case in which you have
two missings per group. This may be enough for you now, but when
cheap, a general solution is always better than a specific one.

The reason -collapse- converts your missings to zero is because all
values are missing for that specific group. -collapse- ignores those
missings and reports zero as the result. One general strategy would be
to count, by group, the missings. Count also the number of
observations (by group) and compare the two. This way, you check if
all observations for a group are missing. If this is the case, then
you found a group to which you should apply -replace-. Below some
code. Compare that with your solution.

*---------------- begin code ----------------------

clear all
set more off

* --------------- input data ----------------------
input ///
str20 provname    provcode    str2 lic    str1 atecosec   str1
atecosec2002    numcontrib
AGRIGENTO              84                       AG           A
        A                     100
AGRIGENTO              84                       AG           A
        B                      50
AGRIGENTO              84                       AG           B
        C                      12
AGRIGENTO              84                       AG           C
        D                      79
AGRIGENTO              84                       AG           O
        P                      34
AGRIGENTO              84                       AG           P
        Q                       0
AGRIGENTO              84                       AG           Z
        Z                       1
ALESSANDRIA            6                        AL           A
        A                      29
ALESSANDRIA            6                        AL           A
        B                      12
ALESSANDRIA            6                        AL           B
        C                       0
ALESSANDRIA            6                        AL           C
        D                       5
end

replace numcontrib = .a if provname == "ALESSANDRIA"

list, sepby(provname)


*----------------- your solution -----------------

/*
generate dnumcontrib = 1 if numcontrib == .a

list, sepby(provname)

collapse (sum) *numcontrib, by(provname atecosec)
replace numcontrib = .a if dnumcontrib == 2

list, sepby(provname)
*/

*--------------- alternative solution ------------

bysort provname atecosec: gen countmiss = sum(numcontrib == .a)
by provname atecosec: gen countall = _n

collapse (sum) numcontrib (last) countmiss countall, by(provname atecosec)
replace numcontrib = .a if countmiss == countall
drop count*

list, sepby(provname)

*---------------- end code ------------------------


On Wed, Jan 29, 2014 at 6:07 AM, Marie-Luise Schmitz <[email protected]> wrote:
>
> Dear Robert,
>
> actually I think I succeded in solving the problem of missings becoming zeroes through collapse but a confirmation would encourage me.
>
> For each of the variables going to be affected by collapse I created a dummy taking the value 1 when the numerical counterpart is a missing. I then applied collapse knowing that values of the dummy equal to 2 indicate missings of the summarized original variable. In a final step, I replace the collapsed numerical variables back to missings if their dummy counterpart is equal to 2.
>
> ---- begin code -----
> order  province_name ateco_section, first
> foreach var of varlist numero_contribuenti_2005-ricercatori_med_2006 {
> generate d_`var'=1 if `var'==.a
> }
> sort province_name ateco_section
> collapse(sum) numero_contribuenti_2005-d_ricercatori_med_2006, by(province_name province_code_107 license_number ateco_section)
> foreach var of varlist numero_contribuenti_2005-ricercatori_med_2006{
> replace `var'=.a if d_`var'==2
> }
> drop d_*
> ---- end code -----
>
>
> Many thanks
> Marie-Luise
>
>
>
>
> Gesendet: Dienstag, 28. Januar 2014 um 19:12 Uhr
> Von: "Roberto Ferrer" <[email protected]>
> An: "Stata Help" <[email protected]>
> Betreff: Re: Re: Re: st: sum over variables for determinate observations
> Marie-Luise,
>
> Maybe -mvencode- and -mvdecode- can help. Check the corresponding help files.
> Below one example:
>
> *---------------- begin code -----------------
>
> clear all
> set more off
>
> sysuse auto
> keep price mpg
> replace price = .
>
> list
>
> mvencode price, mv(0)
> collapse (sum) price mpg
> mvdecode price, mv(0)
>
> list
>
> *----------- end code -----------------------
>
> On Mon, Jan 27, 2014 at 8:34 AM, Marie-Luise Schmitz <[email protected]> wrote:
>> Yes Nick, you are right, I badly explained myself, sorry.
>>
>> I used:
>> -----------------------
>>
>> sort province_name ateco_section
>> collapse(sum) numero_contribuenti_2005-ricercatori_med_2006, by(province_name province_code_107 license_number ateco_section ateco_section_description)
>>
>> -----------------------
>> where 'numero_contribuenti_2005' is the first and 'ricercatori_med_2006' the last numerical variable in the data set.
>>
>> The only remaining problem is that missings I defined as .a appear as zeroes in the collapsed data although it would be desirable to keep them defined as missings.
>>
>>
>>
>>
>> Gesendet: Montag, 27. Januar 2014 um 13:19 Uhr
>> Von: "Nick Cox" <[email protected]>
>> An: "[email protected]" <[email protected]>
>> Betreff: Re: Re: st: sum over variables for determinate observations
>> What you mean by "did not work" is not explained here, but once you
>> -keep- just one observation for each group, scope for accurate
>> calculations of totals of any other variable is lost.
>>
>> -collapse- is, it seems, what you need here, obviating the need for a
>> loop. It was suggested earlier in this thread, and it's not clear why
>> you are not using it.
>>
>> Nick
>> [email protected]
>>
>> On 27 January 2014 12:12, Marie-Luise Schmitz <[email protected]> wrote:
>>> Dear Roberto,
>>>
>>> thank you for your suggestion. I used:
>>>
>>> bysort province_name ateco_section: egen numero_contribuenti_2005_test = total(numero_contribuenti_2005)
>>> by province_name ateco_section: keep if _n == 1
>>> replace numero_contribuenti_2005_test=.a if numero_contribuenti_2005==.a
>>>
>>> to do the task for one variable and it perfectly worked out. But the data set contains 93 numeric variables. I tried to do a foreach loop but this did not work. Any suggestion how to do this for many variables?
>>>
>>>
>>>
>>> Gesendet: Sonntag, 26. Januar 2014 um 19:01 Uhr
>>> Von: "Roberto Ferrer" <[email protected]>
>>> An: "Stata Help" <[email protected]>
>>> Betreff: Re: st: sum over variables for determinate observations
>>> Alternatives are:
>>>
>>> /*
>>> Use -egen, total()-, to compute totals and keep an arbitrary observation
>>> (here the first one).
>>> */
>>>
>>> bysort provname atecosec: egen snumcontrib = total(numcontrib)
>>> by provname atecosec: keep if _n == 1
>>>
>>>
>>> /*
>>> Use -sum- to compute a cumulative sum and keep the last observation
>>> */
>>>
>>> bysort provname atecosec: gen snumcontrib = sum(numcontrib)
>>> by provname atecosec: keep if _n == _N
>>>
>>> The Stata Journal (2002)
>>> 2, Number 1, pp. 86-102
>>> Speaking Stata: How to move step by: step
>>> Nicholas J. Cox
>>>
>>> is a helpful reference.
>>>
>>> On Sun, Jan 26, 2014 at 1:13 PM, Roberto Ferrer <[email protected]> wrote:
>>>> You're right, -collapse- works:
>>>>
>>>> *----------- begin code --------------
>>>>
>>>> clear all
>>>> set more off
>>>>
>>>> input ///
>>>> str20 provname provcode str2 lic str1 atecosec str1
>>>> atecosec2002 numcontrib
>>>> AGRIGENTO 84 AG A
>>>> A 100
>>>> AGRIGENTO 84 AG A
>>>> B 50
>>>> AGRIGENTO 84 AG B
>>>> C 12
>>>> AGRIGENTO 84 AG C
>>>> D 79
>>>> AGRIGENTO 84 AG O
>>>> P 34
>>>> AGRIGENTO 84 AG P
>>>> Q 0
>>>> AGRIGENTO 84 AG Z
>>>> Z 1
>>>> ALESSANDRIA 6 AL A
>>>> A 29
>>>> ALESSANDRIA 6 AL A
>>>> B 12
>>>> ALESSANDRIA 6 AL B
>>>> C 0
>>>> ALESSANDRIA 6 AL C
>>>> D 5
>>>> end
>>>>
>>>> list, sepby(provname)
>>>>
>>>> collapse (sum) numcontrib, by(provname atecosec)
>>>>
>>>> list, sepby(provname)
>>>>
>>>> *------------------- end code ------------------------
>>>>
>>>> On Sun, Jan 26, 2014 at 11:06 AM, Marie-Luise Schmitz
>>>> <[email protected]> wrote:
>>>>> Dear Stata Users,
>>>>>
>>>>> I have a data set that looks like this:
>>>>>
>>>>> province_name province_code_107 license_number ateco_section ateco_section2002 numero_contribuenti...
>>>>> AGRIGENTO 84 AG A A 100
>>>>> AGRIGENTO 84 AG A B 50
>>>>> AGRIGENTO 84 AG B C 12
>>>>> AGRIGENTO 84 AG C D 79
>>>>> AGRIGENTO 84 AG O P 34
>>>>> AGRIGENTO 84 AG P Q 0
>>>>> AGRIGENTO 84 AG Z Z 1
>>>>> ALESSANDRIA 6 AL A A 29
>>>>> ALESSANDRIA 6 AL A B 12
>>>>> ALESSANDRIA 6 AL B C 0
>>>>> ALESSANDRIA 6 AL C D 5
>>>>>
>>>>> It contains numerous numeric variables following the variable numero_contribuenti.
>>>>> The variable ateco_section is a redefined version of the variable ateco_section2002 and shows sectors of economic activity. For instance, A = agriculture, B = fishery, etc.
>>>>> In the redefined variable ateco_section, sectors A and B are summarzied by A.
>>>>> However, the problem is that I want only one entry for sector A for each province that is, for numeric variables as numero_contribuenti I want the sum of previous A and B, hence:
>>>>>
>>>>> province_name province_code_107 license_number ateco_section numero_contribuenti .........
>>>>> AGRIGENTO 84 AG A 150
>>>>> AGRIGENTO 84 AG B 12
>>>>>
>>>>>
>>>>> I want to apply that to each province.
>>>>> I guess this problem may be solved with collapse (sum) but I am totally lost.
>>>>> Any help is highly appreciated.
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/[http://www.stata.com/support/faqs/resources/statalist-faq/][http://www.stata.com/support/faqs/resources/statalist-faq/[http://www.stata.com/support/faqs/resources/statalist-faq/]]
>> * http://www.ats.ucla.edu/stat/stata/[http://www.ats.ucla.edu/stat/stata/][http://www.ats.ucla.edu/stat/stata/[http://www.ats.ucla.edu/stat/stata/]]
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search[http://www.stata.com/help.cgi?search]
>> * http://www.stata.com/support/faqs/resources/statalist-faq/[http://www.stata.com/support/faqs/resources/statalist-faq/]
>> * http://www.ats.ucla.edu/stat/stata/[http://www.ats.ucla.edu/stat/stata/]
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search[http://www.stata.com/help.cgi?search]
> * http://www.stata.com/support/faqs/resources/statalist-faq/[http://www.stata.com/support/faqs/resources/statalist-faq/]
> * http://www.ats.ucla.edu/stat/stata/[http://www.ats.ucla.edu/stat/stata/]
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index