I don't see that -reshape- is problematic here.
Let's add some other variables to show that they don't
mess the problem unduly:
. l
+----------------------------------------------------------------+
| tvp1 tvp2 tvp3 prime pet |
|----------------------------------------------------------------|
1. | massagli,mark wood,j. dessent,harold 2 frog |
2. | beletz,elaine carter,annie curtis,barbara 3 toad |
3. | bradshaw,joe brown,arnold dunaway,lowell 5 newt |
4. | schneider,mark mullins,bobby sump,lawrence 7 wolf |
+----------------------------------------------------------------+
Create a unique identifier if one doesn't exist and then -reshape-
. gen id = _n
. reshape long tvp , i(id)
(note: j = 1 2 3)
Data wide -> long
-----------------------------------------------------------------------------
Number of obs. 4 -> 12
Number of variables 6 -> 5
j variable (3 values) -> _j
xij variables:
tvp1 tvp2 tvp3 -> tvp
-----------------------------------------------------------------------------
. l
+-----------------------------------------+
| id _j tvp prime pet |
|-----------------------------------------|
1. | 1 1 massagli,mark 2 frog |
2. | 1 2 wood,j. 2 frog |
3. | 1 3 dessent,harold 2 frog |
4. | 2 1 beletz,elaine 3 toad |
5. | 2 2 carter,annie 3 toad |
|-----------------------------------------|
6. | 2 3 curtis,barbara 3 toad |
7. | 3 1 bradshaw,joe 5 newt |
8. | 3 2 brown,arnold 5 newt |
9. | 3 3 dunaway,lowell 5 newt |
10. | 4 1 schneider,mark 7 wolf |
|-----------------------------------------|
11. | 4 2 mullins,bobby 7 wolf |
12. | 4 3 sump,lawrence 7 wolf |
+-----------------------------------------+
Get a new column number:
. bysort id (tvp) : replace _j = _n
(5 real changes made)
. l
+-----------------------------------------+
| id _j tvp prime pet |
|-----------------------------------------|
1. | 1 1 dessent,harold 2 frog |
2. | 1 2 massagli,mark 2 frog |
3. | 1 3 wood,j. 2 frog |
4. | 2 1 beletz,elaine 3 toad |
5. | 2 2 carter,annie 3 toad |
|-----------------------------------------|
6. | 2 3 curtis,barbara 3 toad |
7. | 3 1 bradshaw,joe 5 newt |
8. | 3 2 brown,arnold 5 newt |
9. | 3 3 dunaway,lowell 5 newt |
10. | 4 1 mullins,bobby 7 wolf |
|-----------------------------------------|
11. | 4 2 schneider,mark 7 wolf |
12. | 4 3 sump,lawrence 7 wolf |
+-----------------------------------------+
and -reshape- back
. reshape wide tvp , i(id) j(_j)
(note: j = 1 2 3)
Data long -> wide
-----------------------------------------------------------------------------
Number of obs. 12 -> 4
Number of variables 5 -> 6
j variable (3 values) _j -> (dropped)
xij variables:
tvp -> tvp1 tvp2 tvp3
-----------------------------------------------------------------------------
and we are where you want:
. l
+----------------------------------------------------------------------+
| id tvp1 tvp2 tvp3 prime pet |
|----------------------------------------------------------------------|
1. | 1 dessent,harold massagli,mark wood,j. 2 frog |
2. | 2 beletz,elaine carter,annie curtis,barbara 3 toad |
3. | 3 bradshaw,joe brown,arnold dunaway,lowell 5 newt |
4. | 4 mullins,bobby schneider,mark sump,lawrence 7 wolf |
+----------------------------------------------------------------------+
In short,
gen id = _n
reshape long tvp , i(id)
bysort id (tvp) : replace _j = _n
reshape wide tvp , i(id) j(_j)
See also the FAQ
http://www.stata.com/support/faqs/data/reshape3.html
Nick
[email protected]
Scott Merryman
> Shouldn't "dessent,harold" come before "massagli,mark" ?
>
> Here is one way that uses Mata (and Ben Jann's -moremata-)
>
> clear
>
> input str20 x1 str20 x2 str20 x3
> "massagli,mark" "wood,j." "dessent,harold"
> "beletz,elaine" "carter,annie" "curtis,barbara"
> "bradshaw,joe" "brown,arnold" "dunaway,lowell"
> "schneider,mark" "mullins,bobby" "sump,lawrence"
> end
>
> tempfile foo
> mata
> C= J(3,1,"")
> A = st_sdata(.,.)'
> for (i = 1; i <=cols(A); i++) {
> A = sort(A,i)
> C = C,A[.,i]
> }
> C=C[.,(2::cols(A)+1)]'
> mm_outsheet("`foo'",C, mode="r")
> end
> set trace off
> insheet using `foo', clear tab
> l
Caleb Southworth
> > I would like to sort data within a set of variables (within
> a row). The
> > set of variables describe officer names, surname first. For
> example, some
> > of the data are
> >
> > | tvp1 tvp2
> tvp3 |
> >
> |------------------------------------------------------------------|
> > 466. | massagli,mark wood,j.
> dessent,harold |
> > 476. | beletz,elaine carter,annie
> curtis,barbara |
> > 484. | bradshaw,joe brown,arnold
> dunaway,lowell |
> > 497. | schneider,mark mullins,bobby
> sump,lawrence |
> >
> >
> > The desired outcome is
> >
> > | tvp1 tvp2
> tvp3 |
> >
> |------------------------------------------------------------------|
> > 466. | massagli,mark dessent,harold
> wood,j. |
> > 476. | beletz,elaine carter,annie
> curtis,barbara |
> > 484. | bradshaw,joe brown,arnold
> dunaway,lowell |
> > 497. | mullins,bobby schneider,mark
> sump,lawrence |
> >
> > such that the id number remains the same and the data are
> shuffled within
> > the tvp# variables. This will permit an assessment of
> change in officers
> > between years without looping over all the variables of a
> similar type
> > (i.e. vice presidents or presidents).
> >
> > I considered using reshape, but at least in my hands that
> leads to a very
> > long kludge.
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/