Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Unique identifier from a string name


From   Barry Quinn <b.quinn@qub.ac.uk>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   RE: st: Unique identifier from a string name
Date   Thu, 24 Nov 2011 16:58:23 +0000

Brendan thanks  a lot for further comments.

Barry Quinn
Lecturer in Finance

Queens University Belfast              Tel: 028 9097 4824
Riddel Hall                                       www.qub.ac.uk/mgt
185 Stramillis Road                         www.barryquinn.com
Belfast                                             b.quinn@qub.ac.uk
Northern Ireland                             @niperino
BT9 5EE                      


-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Brendan Halpin
Sent: 24 November 2011 16:19
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: Unique identifier from a string name

If you really need a deterministic mapping between string and integer, it might be worth dumbing down the strings as much as possible first (e.g. remove spaces, punctuation, make lowercase). Then map each symbol (in the much reduced set, perhaps only 26) to a single integer and proceed as I suggested before, but with 26 (or whatever) as the multiplier instead of 256.

I am presuming that you have strings in different datasets that you want to match, so that encode won't work because it assigns integers on the basis of the strings currently available to it. It might be worth, though, seeing if you can create a master data set (e.g. by appending rather than merging) and then encoding. You could then split out the original data sets and merge. 

Brendan
-- 
Brendan Halpin,   Department of Sociology,   University of Limerick,   Ireland
Tel: w +353-61-213147  f +353-61-202569  h +353-61-338562;  Room F1-009 x 3147
mailto:brendan.halpin@ul.ie    ULSociology on Facebook: http://on.fb.me/fjIK9t
http://teaching.sociology.ul.ie/bhalpin/wordpress         twitter:@ULSociology
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index