Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Identify a Panel Structure with Incoherent ID and NAME Variable
From
Robert Picard <[email protected]>
To
[email protected]
Subject
Re: st: Identify a Panel Structure with Incoherent ID and NAME Variable
Date
Sat, 27 Aug 2011 00:31:06 -0400
You can use -group_id- (available on SSC) to consolidate your ID codes
when Firm names match. Note that this will miss "General Motors
Stillwater" because the name is unique and the ID is also unique.
Hope this helps,
Robert
*----------- begin example -------------
clear
input obs Year str9 ID str36 Firm
1 1999 000846217 "Stillwater Power"
2 2000 000846217 "Stillwater Power"
3 2001 000846217 "Oklahoma Stillwater Power Co."
4 2002 000846217 "Oklahoma Stillwater Power Co."
5 2003 000846217 "Oklahoma Stillwater Power Co."
6 2004 000846217 "Stillwater Power Co."
7 2005 000846217 "Stillwater Power Co."
8 2006 000846217 "Stillwater Power Co."
9 2007 000846217 "Stillwater Power Co."
10 1999 001834488 "Cushing Farming"
11 2000 001834488 "Oklahoma Cushing Farming"
12 2001 001834488 "Oklahoma Cushing Farming"
13 2002 001834488 "Cushing Farming"
14 2003 001834488 "Cushing Farming"
15 2004 001834488 "Oklahoma Cushing Farming"
16 2005 001834488 "Oklahoma Cushing Farming"
17 2006 001834488 "Cushing Farming"
18 1999 001840116 "Perkins Electricity"
19 2000 001840116 "Perkins Electricity"
20 2001 001840116 "Perkins Electricity"
21 2002 001840116 "Perkins Electricity"
22 2003 001840116 "Oklahoma Perkins Electricity"
23 2005 001840116 "Perkins Electricity"
24 1999 177725736 "General Motors Stillwater"
25 2007 702639116 "Cushing Farming"
26 2004 70270423X "Perkins Electricity"
27 2006 70270423X "Perkins Electricity"
28 2007 70270423X "Oklahoma Perkins Electricity"
29 2000 714509511 "(Oklahoma)General Motors Stillwater"
30 2001 714509511 "(Oklahoma)General Motors Stillwater"
31 2002 714509511 "(Oklahoma)General Motors Stillwater"
32 2003 714509511 "(Oklahoma)General Motors Stillwater"
33 2004 714509511 "General Motors Stillwater Oklahoma"
34 2005 714509511 "General Motors Stillwater Oklahoma"
35 2006 714509511 "General Motors Stillwater Oklahoma"
36 2007 714509511 "GM Stillwater"
end
* type -ssc install group_id- to install
clonevar newid = ID
group_id newid, matchby(Firm)
sort newid ID Year obs
list, noobs sepby(newid)
*------------ end example --------------
On Fri, Aug 26, 2011 at 11:19 PM, 田曦 <[email protected]> wrote:
>
>
> I have a sample from a very large panel dataset looks like below. There are 4 unique firms and 9 observations per firm. Firm's name and ID may change occasionally because it was inputed by different people at different years. Firms name also change occasionally.
> ID is a string variable, may combined with numbers and letters.
>
> Neither ID or Firm variable can fully identify a true unique firm from all 9 years. For example, "Cushing Farming", "Oklahoma Cushing Farming" are the same company, but with two IDs: 001834488 and 702639116; "Perkins Electricity" and "Oklahoma Perkins Electricity" are the same company, also has IDs: 001834488 and 70270423X; "General Motors Stillwater", "(Oklahoma) General Motors Stillwater" and "GM Stillwater" refer the same company, but with only two IDs: 177725736 and 714509511.
>
> obs Year ID Firm
> 1 1999 000846217 Stillwater Power
> 2 2000 000846217 Stillwater Power
> 3 2001 000846217 Oklahoma Stillwater Power Co.
> 4 2002 000846217 Oklahoma Stillwater Power Co.
> 5 2003 000846217 Oklahoma Stillwater Power Co.
> 6 2004 000846217 Stillwater Power Co.
> 7 2005 000846217 Stillwater Power Co.
> 8 2006 000846217 Stillwater Power Co.
> 9 2007 000846217 Stillwater Power Co.
> 10 1999 001834488 Cushing Farming
> 11 2000 001834488 Oklahoma Cushing Farming
> 12 2001 001834488 Oklahoma Cushing Farming
> 13 2002 001834488 Cushing Farming
> 14 2003 001834488 Cushing Farming
> 15 2004 001834488 Oklahoma Cushing Farming
> 16 2005 001834488 Oklahoma Cushing Farming
> 17 2006 001834488 Cushing Farming
> 18 1999 001840116 Perkins Electricity
> 19 2000 001840116 Perkins Electricity
> 20 2001 001840116 Perkins Electricity
> 21 2002 001840116 Perkins Electricity
> 22 2003 001840116 Oklahoma Perkins Electricity
> 23 2005 001840116 Perkins Electricity
> 24 1999 177725736 General Motors Stillwater
> 25 2007 702639116 Cushing Farming
> 26 2004 70270423X Perkins Electricity
> 27 2006 70270423X Perkins Electricity
> 28 2007 70270423X Oklahoma Perkins Electricity
> 29 2000 714509511 (Oklahoma)General Motors Stillwater
> 30 2001 714509511 (Oklahoma)General Motors Stillwater
> 31 2002 714509511 (Oklahoma)General Motors Stillwater
> 32 2003 714509511 (Oklahoma)General Motors Stillwater
> 33 2004 714509511 General Motors Stillwater Oklahoma
> 34 2005 714509511 General Motors Stillwater Oklahoma
> 35 2006 714509511 General Motors Stillwater Oklahoma
> 36 2007 714509511 GM Stillwater
>
> I can tell they are the same firms because when I create group identifier for ID and group identifier for Firm variable, they always overlap each other at some point. I'm thinking how to utilize the overlapping point as an joint knob.
>
> Many Thanks.
>
> Xi Tian
> Department of Economics
> Oklahoma State University
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
>
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/