Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Unique identifier from a string name


From   Barry Quinn <b.quinn@qub.ac.uk>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   RE: st: Unique identifier from a string name
Date   Thu, 24 Nov 2011 16:13:16 +0000

Thanks Nick/Maarten that helps a lot 

Barry Quinn

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Maarten Buis
Sent: 24 November 2011 16:03
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: Unique identifier from a string name

On Thu, Nov 24, 2011 at 4:10 PM, Barry Quinn wrote:
> The context of the problem is to build a panel from yearly data using firm names as the unique id with the -merge- command.

One solution is first create a file with all firms, create the unique identifier, and merge these identifier on to all subsequent files.

Say you have three years stored in files called year1 year2 year3, and the firm name is stored in variable firm:

*---------- begin example ----------
// stack all files
use year1
forvalues i = 2/3 {
    append using year`i'
}

// keep only the firm names
keep firm

// we only need one observation per firm bys firm : keep if _n == 1

// create the unique id
gen firmid = _n

// save this key in a file
save idkey, replace

// add the id to each dataset
forvalues i = 1/3 {
    use year`i'
    merge 1:1 firm using idkey

    // every firm in year`i' got an id
    assert _merge != 1

    // not all firms have to appear in year `i'
    drop if _merge == 2

    // _merge is no longer necessary
    drop _merge

    // I never overwrite the original data
    // hence a new filename
    save year`i'_id, replace
}
*------------ end example ------------

The only problem with this approach is that it assumes that the variable firm contains no typos, that there are no legitimate (or
illegitimate) alternative spellings and/or abbreviations, and that the firm names remained constant. In practice that is highly unlikely, so I would carefully check the idkey file before merging.

Hope this helps,
Maarten

--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany


http://www.maartenbuis.nl
--------------------------
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index