Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Creating household id for groups of persons

From	"Sebastian Soika" <[email protected]>
To	[email protected]
Subject	Re: st: Creating household id for groups of persons
Date	Thu, 7 Jul 2011 14:24:04 +0200 (CEST)

You're not mistaken, now I saw the mistake too.

Now I used your group_id solution, works also very fast. Thank you!

The reason why Austins solution didn't fit for my data is, that I have over 500000 contracts, and it would take hours to use "forv i=1/`r(max)" on them.



-----Ursprüngliche Nachricht-----
Von: "Robert Picard" <[email protected]>
Gesendet: 06.07.2011 19:01:26
An: [email protected]
Betreff: Re: st: Creating household id for groups of persons

>Unless I'm mistaken, Fernando's solution will not always group
>correctly households. In the simple example below, there are 3
>contracts with 4 different members of the same household. Such cases
>require more that one pass over the data (contract 13 groups id 2 and
>4 and then contract 11 and 12 groups 1 2 3 4 together).
>
>* --------------------- begin example ---------------------
>clear all
>input contract id
> 11 1
> 11 2
> 12 3
> 12 4
> 13 2
> 13 4
>end
>
>tempfile f
>qui save "`f'"
>
>* implement Fernando's approach
>egen cid = group(contract)
>bysort id: egen mincid = min(cid)
>bysort contract: egen hid = min(mincid)
>list , noobs clean
>
>* redo using -group_id-
>use "`f'", clear
>clonevar hid = id
>group_id hid, match(contract)
>list , noobs clean
>
>* --------------------- end example -----------------------
>
>
>
>On Wed, Jul 6, 2011 at 11:47 AM, Hans Meier <[email protected]> wrote:
>> Hello Austin and Robert,
>>
>> thank you for your solutions.
>> I'm sure they would work, but I have a very large dataset, so Austins solution would take hours, and for Roberts solution I would have to use SSC.
>>
>> But another Stata user sent me this solution:
>>
>> Von: "Fernando Rios Avila" <[email protected]>
>> Gesendet: 06.07.2011 15:18:00
>> An: "'Hans Meier'" <[email protected]>
>> Betreff: RE: st: Creating household id for groups of persons
>>
>>>Hi Hans,
>>>I was playing around with a very small sample similar to yours, and come up with this small code.
>>>Here hid3 would be the household id code.
>>>
>>> egen hid1=group (contract)
>>> bysort id: egen hid2=min(hid)
>>> bysort contract:egen hid3=min(hid2)
>>>
>>>Hope this is what u were looking for.
>>>Best
>>
>>
>> It works perfect, and very fast.
>>
>> Thank you Fernando!
>>
>>
>>
>> -----Ursprüngliche Nachricht-----
>> Von: "Robert Picard" <[email protected]>
>> Gesendet: 06.07.2011 16:50:42
>> An: [email protected]
>> Betreff: Re: st: Creating household id for groups of persons
>>
>>>Or get -group_id- from SSC. Using Austin's data:
>>>
>>>* --------------------- begin example ---------------------
>>>clear all
>>>input contract id
>>> 123 1
>>> 123 2
>>> 123 3
>>> 456 4
>>> 456 5
>>> 678 1
>>> 456 3
>>> 789 6
>>> 789 7
>>> 456 8
>>>end
>>>
>>>clonevar gid = id
>>>group_id gid, match(contract)
>>>
>>>list , noobs clean
>>>
>>>* --------------------- begin example ---------------------
>>>
>>>
>>>On Wed, Jul 6, 2011 at 10:29 AM, Austin Nichols <[email protected]> wrote:
>>>> Hans Meier <[email protected]>:
>>>>
>>>> Maybe this is what you want?
>>>>
>>>> clear all
>>>> input contract id
>>>>  123  1
>>>>  123  2
>>>>  123  3
>>>>  456  4
>>>>  456  5
>>>>  678  1
>>>>  456  3
>>>>  789  6
>>>>  789  7
>>>>  456  8
>>>> end
>>>> g long obs=_n
>>>> egen long i=group(id)
>>>> la var i "Person id from 1 to M"
>>>> egen long gp=group(contract)
>>>> la var gp "Contract id from 1 to G"
>>>> bys i (gp):g long ct=sum(gp!=gp[_n-1])
>>>> la var ct "n distinct contract by id"
>>>> sort i ct
>>>> su i, mean
>>>> forv i=1/`r(max)' {
>>>>  su ct if i==`i', mean
>>>>  if r(max)==1 continue
>>>>  loc max=r(max)
>>>>  su gp if ct==1&i==`i', mean
>>>>  loc g1=r(max)
>>>>  forv j=2/`max' {
>>>>  su gp if ct==`j'&i==`i', mean
>>>>  replace gp=`g1' if gp==r(max)
>>>>  }
>>>>  }
>>>> sort obs
>>>> drop obs ct i
>>>> l, noo clean
>>>>
>>>>
>>>>
>>>> On Wed, Jul 6, 2011 at 8:45 AM, Hans Meier <[email protected]> wrote:
>>>>> Yes, now you got my question right.
>>>>> I don't know who lives in in which household, and I also don't have further information about this.
>>>>>
>>>>> But I assume, that if people have an insurance contract together, they are somehow connected and I define them as one household.
>>>>> (I look only at non-life insurance, no pension funds etc.)
>>>>>
>>>>> In my example, I define the persons from contract "123" (id's "1", "2", "3") as one household, let's say household A, and those in contract "456" (id's "4", "5") as another household B.
>>>>> Now, in contract "678", the id "1" tells me that this is the same person who is also in the contract "123", so I want this contract to be put in household A.
>>>>>
>>>>> To your question:
>>>>> Unfortunately,  I have a very large dataset, so I can't tell if I have one contract in each household that covers all household members.
>>>>> To err on the side of caution, I would rather assume I don't have such complete contracts.
>>>>
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/statalist/faq
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>
>>>
>>>*
>>>* For searches and help try:
>>>* http://www.stata.com/help.cgi?search
>>>* http://www.stata.com/support/statalist/faq
>>>* http://www.ats.ucla.edu/stat/stata/
>>
>>
>> ___________________________________________________________
>> Schon gehört? WEB.DE hat einen genialen Phishing-Filter in die
>> Toolbar eingebaut! http://produkte.web.de/go/toolbar
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>
>*
>* For searches and help try:
>* http://www.stata.com/help.cgi?search
>* http://www.stata.com/support/statalist/faq
>* http://www.ats.ucla.edu/stat/stata/


___________________________________________________________
Schon gehört? WEB.DE hat einen genialen Phishing-Filter in die
Toolbar eingebaut! http://produkte.web.de/go/toolbar

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: st: SSC Activity, June 2011
Next by Date: Re: st: In this particular case: should I prefer clustering or a random-effects model
Previous by thread: st: RE: RE: Recording episodes
Next by thread: st: split a large dataset
Index(es):
- Date
- Thread