Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Sergiy Radyakin <serjradyakin@gmail.com> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: Compressing a panel dataset |
Date | Tue, 23 Jul 2013 15:41:26 -0400 |
Dear Lukas, you need to formalize this: "Sometimes I am interested to keep only the 0 in such a case, sometimes the 1 and certainly need to keep the info if data is missing at all." Under what conditions do you want the resulting value to be 0? under what be 1? under what be missing? Once we know the answers to these questions, we can modify the -collapse- statement or write something else more suitable. Also what is the universe of values of vi? is it only 0/1/. or other values are possible? Suppose you want 1's and value -9999 is not in the universe: replace v1=-9999 if missing(v1) collapse (max) v1, by(id) replace v1=. if v1==-9999 Do not write collapse (max) v1 if !missing(v1) This might drop observations that you want to retain. Best, Sergiy On Tue, Jul 23, 2013 at 3:09 PM, Lukas Borkowski <570722@soas.ac.uk> wrote: > Sergiy, > > thank you for you help. However, I encountered a problem. My dataset is unfortunately not as easy as I described in the earlier email. Initially, I didn't think it would make a big difference, but it does. There a few cases where one dummy variable, say v1, has different values within the household. Sometimes I am interested to keep only the 0 in such a case, sometimes the 1 and certainly need to keep the info if data is missing at all. So it looks like: > > id v1 v2 v3 > 1 1 . 2 > 1 0 7 . > 1 . . . > 2 1 . 1 > 2 1 7 . > 2 . . . > ... > > If I run -collapse (min)v1, by(id)- I can get rid of the missing values and keep the 0 for household 1. But say I was interested in the 1, what could I do? Running -collapse (max)v1, by(id)- takes on the missing value. > > Do you have an idea? > > Best, > > Lukas > > # > Lukas Borkowski > University of London, School of Oriental and African Studies (SOAS) > > > > > On 23.07.2013, at 16:43, Sergiy Radyakin <serjradyakin@gmail.com> wrote: > >> collapse (min) v1 (min) v2 (min) v3, by(id) >> >> id v1 v2 v3 >> 1 9 7 2 >> 2 7 7 1 >> >> >> Best, Sergiy >> >> On Tue, Jul 23, 2013 at 10:25 AM, Lukas Borkowski <570722@soas.ac.uk> wrote: >>> Dear list, >>> >>> I am using Stata 12 and currently clean up a dataset that will become a panel dataset. The quality of the dataset is quite poor (originates from a survey) and I face multiple (endless) situations where values to the same questions are recorded in different variables. I would now want to eliminate duplicates and to retain only one row for each household. My dataset looks somewhat like this: >>> >>> id v1 v2 v3 >>> 1 . . 2 >>> 1 . 7 . >>> 1 9 . . >>> 2 . . 1 >>> 2 . 7 . >>> 2 7 . . >>> ... >>> >>> I would like to retain only one row for each household. Is there a command for this? I have tried different things but have not found any solution. >>> >>> Do you have any suggestion what I could do? >>> >>> Thank you very much for your help! >>> >>> Best, >>> >>> Lukas >>> >>> # >>> Lukas Borkowski >>> University of London, School of Oriental and African Studies (SOAS) >>> >>> M: 570722@soas.ac.uk >>> >>> >>> >>> >>> * >>> * For searches and help try: >>> * http://www.stata.com/help.cgi?search >>> * http://www.stata.com/support/faqs/resources/statalist-faq/ >>> * http://www.ats.ucla.edu/stat/stata/ >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/faqs/resources/statalist-faq/ >> * http://www.ats.ucla.edu/stat/stata/ > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/