Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Dividing a variable


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: Dividing a variable
Date   Wed, 23 Oct 2002 20:22:37 +0100

Hoetker, Glenn
>
> Hoping someone can help me with a problem involving dividing up a
> variable. My data consists of patent numbers and inventors and looks
> like this:
>
> nmi
> wku
> Schmitt, Ty; Gandre, Jerry
> 5586003
> Sato, N. Albert; Baker, David C.; Waldron, Christie J.
> 5586324
> Swamy, N. Deepak
> 5587885
>
> I would like it to look like this:
>
> nmi
> wku
> Schmitt, Ty
> 5586003
> Gandre, Jerry
> 5586003
> Sato, N. Albert
> 5586324
> Baker, David C.
> 5586324
> Waldron, Christie J.
> 5586324
> Swamy, N. Deepak
> 5587885
>
> That is, I want to create a record containing each inventor
> and his or
> her associated patent number.  If Ty Schmitt had five
> patents, he should
> show up in five records.  The number of inventors per
> patent varies from
> one to many.
>
> I've looked for egen functions (and their extensions) and done some
> experimenting, but am floundering.  Any help would be very
> appreciated!

The "nmi whu" stuff I don't understand.

I am going to be optimistic and assume it is a preamble
you can strip off.

My suggestion is to use -split- from SSC and -reshape-.
For -split-,

. ssc inst split

For -reshape-, we're using the Third Law of Reshaping:

* You may need two -reshape-s to get where you want to be *.

Here's my log:

. l

                                                   whatever
  1.                             Schmitt, Ty; Gandre, Jerry
  2.                                                5586003
  3. Sato, N. Albert; Baker, David C.; Waldron, Christie J.
  4.                                                5586324
  5.                                       Swamy, N. Deepak
  6.                                                5587885

First we set up row and column identifiers for a -reshape-:

. egen id = seq(), b(2)

. egen field = seq(), t(2)

. l

                                                   whatever        id
field
  1.                             Schmitt, Ty; Gandre, Jerry         1
1
  2.                                                5586003         1
2
  3. Sato, N. Albert; Baker, David C.; Waldron, Christie J.         2
1
  4.                                                5586324         2
2
  5.                                       Swamy, N. Deepak         3
1
  6.                                                5587885         3
2

Now we map each pair of observations into one:

. reshape wide whatever, i(id) j(field)
. l

Observation 1

          id            1    whatev~1 Schmitt, Ty;..  whatev~2
5586003


Observation 2

          id            2    whatev~1 Sato, N. Alb..  whatev~2
5586324


Observation 3

          id            3    whatev~1 Swamy, N. De..  whatev~2
5587885

. rename whatever1 who

-split- works on some separator. Here it's a semi-colon:

. split who, p(;)
variables created as string: who1      who2      who3

We have the original -who- and the parts -who?-.
The original will just be in the way:

. drop who

Now the finish is in sight:

. reshape long who, i(id)

. drop if who == ""

. compress

. l

           id         _j  whatever2                    who
  1.        1          1    5586003            Schmitt, Ty
  2.        1          2    5586003          Gandre, Jerry
  3.        2          1    5586324        Sato, N. Albert
  4.        2          2    5586324        Baker, David C.
  5.        2          3    5586324   Waldron, Christie J.
  6.        3          1    5587885       Swamy, N. Deepak


Nick
n.j.cox@durham.ac.uk


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index