Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: RE: sort strings within rows


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: RE: sort strings within rows
Date   Wed, 6 Sep 2006 11:56:01 +0100

I don't see that -reshape- is problematic here. 

Let's add some other variables to show that they don't 
mess the problem unduly: 

. l

     +----------------------------------------------------------------+
     |           tvp1            tvp2             tvp3   prime    pet |
     |----------------------------------------------------------------|
  1. |  massagli,mark         wood,j.   dessent,harold       2   frog |
  2. |  beletz,elaine    carter,annie   curtis,barbara       3   toad |
  3. |   bradshaw,joe    brown,arnold   dunaway,lowell       5   newt |
  4. | schneider,mark   mullins,bobby    sump,lawrence       7   wolf |
     +----------------------------------------------------------------+

Create a unique identifier if one doesn't exist and then -reshape- 

. gen id = _n

. reshape long tvp , i(id)
(note: j = 1 2 3)

Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                        4   ->      12
Number of variables                   6   ->       5
j variable (3 values)                     ->   _j
xij variables:
                         tvp1 tvp2 tvp3   ->   tvp
-----------------------------------------------------------------------------

. l

     +-----------------------------------------+
     | id   _j              tvp   prime    pet |
     |-----------------------------------------|
  1. |  1    1    massagli,mark       2   frog |
  2. |  1    2          wood,j.       2   frog |
  3. |  1    3   dessent,harold       2   frog |
  4. |  2    1    beletz,elaine       3   toad |
  5. |  2    2     carter,annie       3   toad |
     |-----------------------------------------|
  6. |  2    3   curtis,barbara       3   toad |
  7. |  3    1     bradshaw,joe       5   newt |
  8. |  3    2     brown,arnold       5   newt |
  9. |  3    3   dunaway,lowell       5   newt |
 10. |  4    1   schneider,mark       7   wolf |
     |-----------------------------------------|
 11. |  4    2    mullins,bobby       7   wolf |
 12. |  4    3    sump,lawrence       7   wolf |
     +-----------------------------------------+

Get a new column number: 

. bysort id (tvp) : replace _j = _n
(5 real changes made)

. l

     +-----------------------------------------+
     | id   _j              tvp   prime    pet |
     |-----------------------------------------|
  1. |  1    1   dessent,harold       2   frog |
  2. |  1    2    massagli,mark       2   frog |
  3. |  1    3          wood,j.       2   frog |
  4. |  2    1    beletz,elaine       3   toad |
  5. |  2    2     carter,annie       3   toad |
     |-----------------------------------------|
  6. |  2    3   curtis,barbara       3   toad |
  7. |  3    1     bradshaw,joe       5   newt |
  8. |  3    2     brown,arnold       5   newt |
  9. |  3    3   dunaway,lowell       5   newt |
 10. |  4    1    mullins,bobby       7   wolf |
     |-----------------------------------------|
 11. |  4    2   schneider,mark       7   wolf |
 12. |  4    3    sump,lawrence       7   wolf |
     +-----------------------------------------+

and -reshape- back 

. reshape wide tvp , i(id) j(_j)
(note: j = 1 2 3)

Data                               long   ->   wide
-----------------------------------------------------------------------------
Number of obs.                       12   ->       4
Number of variables                   5   ->       6
j variable (3 values)                _j   ->   (dropped)
xij variables:
                                    tvp   ->   tvp1 tvp2 tvp3
-----------------------------------------------------------------------------

and we are where you want: 

. l

     +----------------------------------------------------------------------+
     | id             tvp1             tvp2             tvp3   prime    pet |
     |----------------------------------------------------------------------|
  1. |  1   dessent,harold    massagli,mark          wood,j.       2   frog |
  2. |  2    beletz,elaine     carter,annie   curtis,barbara       3   toad |
  3. |  3     bradshaw,joe     brown,arnold   dunaway,lowell       5   newt |
  4. |  4    mullins,bobby   schneider,mark    sump,lawrence       7   wolf |
     +----------------------------------------------------------------------+

In short, 

gen id = _n
reshape long tvp , i(id)
bysort id (tvp) : replace _j = _n
reshape wide tvp , i(id) j(_j)

See also the FAQ 

http://www.stata.com/support/faqs/data/reshape3.html

Nick 
n.j.cox@durham.ac.uk 

Scott Merryman
 
> Shouldn't "dessent,harold" come before "massagli,mark" ?
> 
> Here is one way that uses Mata (and Ben Jann's -moremata-)
> 
> clear
> 
> input str20 x1 str20 x2 str20 x3
> "massagli,mark" "wood,j." "dessent,harold"  
> "beletz,elaine" "carter,annie" "curtis,barbara"  
> "bradshaw,joe" "brown,arnold" "dunaway,lowell"  
> "schneider,mark" "mullins,bobby" "sump,lawrence"  
> end
> 
> tempfile foo
> mata
> C= J(3,1,"")
> A = st_sdata(.,.)'
> for (i = 1; i <=cols(A); i++) {
>     	A = sort(A,i)
>     	C = C,A[.,i]
> }
> C=C[.,(2::cols(A)+1)]'
> mm_outsheet("`foo'",C, mode="r")
> end
> set trace off
> insheet using `foo', clear tab
> l

Caleb Southworth
 
> > I would like to sort data within a set of variables (within 
> a row). The
> > set of variables describe officer names, surname first. For 
> example, some
> > of the data are
> > 
> >       |              tvp1                     tvp2          
>         tvp3 |
> >       
> |------------------------------------------------------------------|
> >  466. |     massagli,mark                  wood,j.        
> dessent,harold |
> >  476. |     beletz,elaine             carter,annie        
> curtis,barbara |
> >  484. |      bradshaw,joe             brown,arnold        
> dunaway,lowell |
> >  497. |    schneider,mark            mullins,bobby         
> sump,lawrence |
> > 
> > 
> > The desired outcome is
> > 
> >       |              tvp1                     tvp2          
>         tvp3 |
> >       
> |------------------------------------------------------------------|
> >  466. |     massagli,mark           dessent,harold          
>      wood,j. |
> >  476. |     beletz,elaine             carter,annie        
> curtis,barbara |
> >  484. |      bradshaw,joe             brown,arnold        
> dunaway,lowell |
> >  497. |     mullins,bobby           schneider,mark         
> sump,lawrence |
> > 
> > such that the id number remains the same and the data are 
> shuffled within
> > the tvp# variables. This will permit an assessment of 
> change in officers
> > between years without looping over all the variables of a 
> similar type
> > (i.e. vice presidents or presidents).
> > 
> > I considered using reshape, but at least in my hands that 
> leads to a very
> > long kludge.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index