Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Sebastian Soika" <Sebastian_Soika@web.de> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Creating household id for groups of persons |

Date |
Thu, 7 Jul 2011 14:24:04 +0200 (CEST) |

You're not mistaken, now I saw the mistake too. Now I used your group_id solution, works also very fast. Thank you! The reason why Austins solution didn't fit for my data is, that I have over 500000 contracts, and it would take hours to use "forv i=1/`r(max)" on them. -----Ursprüngliche Nachricht----- Von: "Robert Picard" <picard@netbox.com> Gesendet: 06.07.2011 19:01:26 An: statalist@hsphsun2.harvard.edu Betreff: Re: st: Creating household id for groups of persons >Unless I'm mistaken, Fernando's solution will not always group >correctly households. In the simple example below, there are 3 >contracts with 4 different members of the same household. Such cases >require more that one pass over the data (contract 13 groups id 2 and >4 and then contract 11 and 12 groups 1 2 3 4 together). > >* --------------------- begin example --------------------- >clear all >input contract id > 11 1 > 11 2 > 12 3 > 12 4 > 13 2 > 13 4 >end > >tempfile f >qui save "`f'" > >* implement Fernando's approach >egen cid = group(contract) >bysort id: egen mincid = min(cid) >bysort contract: egen hid = min(mincid) >list , noobs clean > >* redo using -group_id- >use "`f'", clear >clonevar hid = id >group_id hid, match(contract) >list , noobs clean > >* --------------------- end example ----------------------- > > > >On Wed, Jul 6, 2011 at 11:47 AM, Hans Meier <mr.hans.meier@web.de> wrote: >> Hello Austin and Robert, >> >> thank you for your solutions. >> I'm sure they would work, but I have a very large dataset, so Austins solution would take hours, and for Roberts solution I would have to use SSC. >> >> But another Stata user sent me this solution: >> >> Von: "Fernando Rios Avila" <f.rios.a@gmail.com> >> Gesendet: 06.07.2011 15:18:00 >> An: "'Hans Meier'" <mr.hans.meier@web.de> >> Betreff: RE: st: Creating household id for groups of persons >> >>>Hi Hans, >>>I was playing around with a very small sample similar to yours, and come up with this small code. >>>Here hid3 would be the household id code. >>> >>> egen hid1=group (contract) >>> bysort id: egen hid2=min(hid) >>> bysort contract:egen hid3=min(hid2) >>> >>>Hope this is what u were looking for. >>>Best >> >> >> It works perfect, and very fast. >> >> Thank you Fernando! >> >> >> >> -----Ursprüngliche Nachricht----- >> Von: "Robert Picard" <picard@netbox.com> >> Gesendet: 06.07.2011 16:50:42 >> An: statalist@hsphsun2.harvard.edu >> Betreff: Re: st: Creating household id for groups of persons >> >>>Or get -group_id- from SSC. Using Austin's data: >>> >>>* --------------------- begin example --------------------- >>>clear all >>>input contract id >>> 123 1 >>> 123 2 >>> 123 3 >>> 456 4 >>> 456 5 >>> 678 1 >>> 456 3 >>> 789 6 >>> 789 7 >>> 456 8 >>>end >>> >>>clonevar gid = id >>>group_id gid, match(contract) >>> >>>list , noobs clean >>> >>>* --------------------- begin example --------------------- >>> >>> >>>On Wed, Jul 6, 2011 at 10:29 AM, Austin Nichols <austinnichols@gmail.com> wrote: >>>> Hans Meier <mr.hans.meier@web.de>: >>>> >>>> Maybe this is what you want? >>>> >>>> clear all >>>> input contract id >>>> 123 1 >>>> 123 2 >>>> 123 3 >>>> 456 4 >>>> 456 5 >>>> 678 1 >>>> 456 3 >>>> 789 6 >>>> 789 7 >>>> 456 8 >>>> end >>>> g long obs=_n >>>> egen long i=group(id) >>>> la var i "Person id from 1 to M" >>>> egen long gp=group(contract) >>>> la var gp "Contract id from 1 to G" >>>> bys i (gp):g long ct=sum(gp!=gp[_n-1]) >>>> la var ct "n distinct contract by id" >>>> sort i ct >>>> su i, mean >>>> forv i=1/`r(max)' { >>>> su ct if i==`i', mean >>>> if r(max)==1 continue >>>> loc max=r(max) >>>> su gp if ct==1&i==`i', mean >>>> loc g1=r(max) >>>> forv j=2/`max' { >>>> su gp if ct==`j'&i==`i', mean >>>> replace gp=`g1' if gp==r(max) >>>> } >>>> } >>>> sort obs >>>> drop obs ct i >>>> l, noo clean >>>> >>>> >>>> >>>> On Wed, Jul 6, 2011 at 8:45 AM, Hans Meier <mr.hans.meier@web.de> wrote: >>>>> Yes, now you got my question right. >>>>> I don't know who lives in in which household, and I also don't have further information about this. >>>>> >>>>> But I assume, that if people have an insurance contract together, they are somehow connected and I define them as one household. >>>>> (I look only at non-life insurance, no pension funds etc.) >>>>> >>>>> In my example, I define the persons from contract "123" (id's "1", "2", "3") as one household, let's say household A, and those in contract "456" (id's "4", "5") as another household B. >>>>> Now, in contract "678", the id "1" tells me that this is the same person who is also in the contract "123", so I want this contract to be put in household A. >>>>> >>>>> To your question: >>>>> Unfortunately, I have a very large dataset, so I can't tell if I have one contract in each household that covers all household members. >>>>> To err on the side of caution, I would rather assume I don't have such complete contracts. >>>> >>>> * >>>> * For searches and help try: >>>> * http://www.stata.com/help.cgi?search >>>> * http://www.stata.com/support/statalist/faq >>>> * http://www.ats.ucla.edu/stat/stata/ >>>> >>> >>>* >>>* For searches and help try: >>>* http://www.stata.com/help.cgi?search >>>* http://www.stata.com/support/statalist/faq >>>* http://www.ats.ucla.edu/stat/stata/ >> >> >> ___________________________________________________________ >> Schon gehört? WEB.DE hat einen genialen Phishing-Filter in die >> Toolbar eingebaut! http://produkte.web.de/go/toolbar >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ >> > >* >* For searches and help try: >* http://www.stata.com/help.cgi?search >* http://www.stata.com/support/statalist/faq >* http://www.ats.ucla.edu/stat/stata/ ___________________________________________________________ Schon gehört? WEB.DE hat einen genialen Phishing-Filter in die Toolbar eingebaut! http://produkte.web.de/go/toolbar * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: SSC Activity, June 2011** - Next by Date:
**Re: st: In this particular case: should I prefer clustering or a random-effects model** - Previous by thread:
**st: RE: RE: Recording episodes** - Next by thread:
**st: split a large dataset** - Index(es):