[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Rafael Osorio" <rafael.osorio@undp-povertycentre.org> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: RE: RE: Identifier from three variables |

Date |
Mon, 30 Jul 2007 13:53:50 -0300 |

As long as you have confidence on what you are doing, it doesn't matter how you do it. There are always many ways of getting the same result when programming. I agree w/ Nick that Stata offers better ways of creating the identifier. However, the old method I reminded will do the trick w/ any programming language - OK, this is a Stata list but if you are new to it, it may be useful to understand the basic algorithm to perform the task. The e-mail I sent got stuck in my outgoing server since last week. I apologize for it being sent so late to the list. Rafael G. Osorio -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of n j cox Sent: Monday, July 30, 2007 12:22 PM To: statalist@hsphsun2.harvard.edu Subject: st: RE: RE: Identifier from three variables This isn't another way of doing it. It is a variation on a way already discussed in this thread by Svend Juul and indeed by myself (#3 in the post to which this post replies). Generating a numeric identifier by this method has no real advantages that I can see and at least five disadvantages compared with concatenation of strings: 1. You need to check the limits of each identifier, as Rafael points out, as the result will be legal to Stata. 2. If you forget #1 or make a mistake on #1, the resulting problems may not be obvious. 3. Here the default variable type is a -float-. Using that to hold (very large) integers could run into precision problems. Rafael is careful to recommend using a -double- if needed but the problem is that everyone has to remember to do that. 4. Concatenation can make use of separators, as in egen id = concat(house family individual), p("_") 5. The reverse engineering from composite to individual identifiers is easier with -split- (provided you follow #4) than it is usually is with composite integers. Rafael's method, which is quite often used, is relatively simple and can be unproblematic, but it is not a good general method in my view. Nick n.j.cox@durham.ac.uk Rafael Osorio Other way of doing it, supposing each identifier is <= 99: gen id = house*1000+family*100+individual By doing it this way you can tell to which house and family a specific person belongs to. If your data was not sorted by the identifiers, if you sort by this new identifier the result will be equal to sorting it by all identifiers. If your final id is large: gen double id... If your identifiers are greater than 99, adjust the multipliers. Nick Cox 1. Don't do that. Use -egen, group()- with the -label- option. 2. See also FAQ . . . . . . . . . . . . . . . . . . . . . . Creating group identifiers 3/01 How do I create individual identifiers numbered from 1 upwards? http://www.stata.com/support/faqs/data/group.html 3. If you really must, look into -egen, concat()-, including its options. Nick n.j.cox@durham.ac.uk Nádia N. Simőes > > How can I generate an identifier from three variables? > In my data I have a column for the house, one for the family > and one for the individual > > example: > > house family individual > 1 1 1 > 1 1 2 > 1 1 3 > 2 1 1 > 2 1 2 > ... > > and I would like to know how can I create an identifier per > individual such as: > > individual > 010101 > 010102 > 010103 > 020101 > ... * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: RE: RE: Identifier from three variables***From:*n j cox <n.j.cox@durham.ac.uk>

- Prev by Date:
**st: DHS survey data and Logistic regression** - Next by Date:
**st: RE: Incomplete references are not acceptable** - Previous by thread:
**st: RE: RE: Identifier from three variables** - Next by thread:
**st: RE: nlogit with dummies (follow up)** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |