# st: reshape with j split

 From David Airey To statalist@hsphsun2.harvard.edu Subject st: reshape with j split Date Fri, 12 Dec 2003 14:19:10 -0600

I have a reshape question. I find this one of the hardest commands to remember how to use.

I cannot find a help example that exactly parallels my situation. I have an identifier that is split between variables. This situation is common in ANOVA where treatment cells may be identified by more than one factor.

I have data like:

. list, sep(6)

+----------------------------------------+
| s1level s1s2de~y animal s2peak~e |
|----------------------------------------|
1. | 0 50 1_1_0F 773.75 |
2. | 0 100 1_1_0F 1001.63 |
3. | 75 50 1_1_0F 472.5 |
4. | 75 100 1_1_0F 927.875 |
5. | 85 50 1_1_0F 611.375 |
6. | 85 100 1_1_0F 654.375 |
|----------------------------------------|
7. | 0 50 1_1_1F 1116.88 |
8. | 0 100 1_1_1F 1101.38 |
9. | 75 50 1_1_1F 544.875 |
10. | 75 100 1_1_1F 567.875 |
11. | 85 50 1_1_1F 443.875 |
12. | 85 100 1_1_1F 466 |
|----------------------------------------|
13. | 0 50 1_1_2F 309.5 |
14. | 0 100 1_1_2F 336.286 |
15. | 75 50 1_1_2F 442.625 |
etc.

where the first two variables s1level and s1s2delay define 6 treatment conditions from which s2peakvalue was measured fore each animal. I would like to reshape this data to calculate a ratio from the conditions within each animal. I would like to get a data set that looks like:

animal s2peak0_50 s2peak0_100 s2peak75_50 s2peak75_100 s2peak85_50 s2peak85_100

in order to calculate a ratios of each of variables 4-7 with the average of variables 2 and 3. I can do this directly in the long form by the following code:

egen step = seq(), from(0) to(5) block(1)
gen ppi2 = ((s2peak[_n-step]+s2peak[_n-step+1])/2 - s2peak[_n])/((s2peak[_n-step]+s2peak[_n-step+1])/2)*100
drop if s1level == 0

+----------------------------------------------------+
| s1level s1s2de~y animal s2peak~e ppi2 |
|----------------------------------------------------|
1. | 75 50 1_1_0F 472.5 46.77181 |
2. | 75 100 1_1_0F 927.875 -4.527213 |
3. | 85 50 1_1_0F 611.375 31.12723 |
4. | 85 100 1_1_0F 654.375 26.28318 |
|----------------------------------------------------|
5. | 75 50 1_1_1F 544.875 50.87344 |
6. | 75 100 1_1_1F 567.875 48.79973 |
7. | 85 50 1_1_1F 443.875 59.97971 |
8. | 85 100 1_1_1F 466 57.9849 |
|----------------------------------------------------|
9. | 75 50 1_1_2F 442.625 -37.08108 |
10. | 75 100 1_1_2F 265 17.92943 |
11. | 85 50 1_1_2F 264.5 18.08428 |
12. | 85 100 1_1_2F 192.375 40.42141 |
|----------------------------------------------------|
13. | 75 50 1_1_3F 448.875 50.06605 |
14. | 75 100 1_1_3F 462.143 48.5901 |
15. | 85 50 1_1_3F 576.875 35.82702 |
etc.

but I'm wondering if reshape to wide and then back to long would not be more reliable. As long as data are not missing, I currently have no problems. Must I, before I go for wide, say something like,

. egen treatment = group(s1level s1s2delay), label
. drop s1level s1s2delay

+------------------------------+
| animal s2peak~e treatm~t |
|------------------------------|
1. | 1_1_0F 773.75 0 50 |
2. | 1_1_0F 1001.63 0 100 |
3. | 1_1_0F 472.5 75 50 |
4. | 1_1_0F 927.875 75 100 |
5. | 1_1_0F 611.375 85 50 |
6. | 1_1_0F 654.375 85 100 |
|------------------------------|
7. | 1_1_1F 1116.88 0 50 |
8. | 1_1_1F 1101.38 0 100 |
9. | 1_1_1F 544.875 75 50 |
10. | 1_1_1F 567.875 75 100 |
+------------------------------+
etc.

and only then,

. reshape wide s2peakvalue, i(animal) j(treatment)

animal s2peak~1 s2peak~2 s2peak~3 s2peak~4 s2peak~5 s2peak~6
1. 1_1_0F 773.75 1001.63 472.5 927.875 611.375 654.375
2. 1_1_1F 1116.88 1101.38 544.875 567.875 443.875 466
3. 1_1_2F 309.5 336.286 442.625 265 264.5 192.375
etc.

but then I lose my way back to the proper long format for ANOVA as well as the factor labels, etc.

-Dave

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/