Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: Reshape problem


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: Reshape problem
Date   Thu, 4 May 2006 20:57:21 +0100

I agree with Radu that this is a problem requiring 
a transpose of the data. 

As Radu hints, -xpose- requires all variables
to be numeric. But -sxpose- on SSC is a rough-and-ready
transpose for string variables. 

So I had a go: 

. sxpose, clear force

. l

     +-------------------------------------------------------------------------------------------------------+
  1. | _var1 | _var2 |    _var3 |     _var4 |        _var5 |        _var6 |           _var7 |          _var8 |
     |     2 |     3 |        4 |         5 |            6 |            7 |               8 |              9 |
     |-------------------------------------------------------------------------------------------------------|
     |                             _var9                  |                          _var10                  |
     |                                10                  |                              11                  |
     +-------------------------------------------------------------------------------------------------------+

     +-------------------------------------------------------------------------------------------------------+
  2. | _var1 | _var2 |    _var3 |     _var4 |        _var5 |        _var6 |           _var7 |          _var8 |
     |   Sex |   Age | Left eye | Right eye | Lower eyelid | Upper eyelid | Lateral canthus | Medial canthus |
     |-------------------------------------------------------------------------------------------------------|
     |                             _var9                  |                          _var10                  |
     |                  Recurrent lesion                  |                  Primary lesion                  |
     +-------------------------------------------------------------------------------------------------------+

     +-------------------------------------------------------------------------------------------------------+
  3. | _var1 | _var2 |    _var3 |     _var4 |        _var5 |        _var6 |           _var7 |          _var8 |
     |     M |    47 |          |         Y |            Y |              |                 |              Y |
     |-------------------------------------------------------------------------------------------------------|
     |                             _var9                  |                          _var10                  |
     |                                                    |                               Y                  |
     +-------------------------------------------------------------------------------------------------------+

     +-------------------------------------------------------------------------------------------------------+
  4. | _var1 | _var2 |    _var3 |     _var4 |        _var5 |        _var6 |           _var7 |          _var8 |
     |     M |    66 |          |         Y |            Y |              |                 |              Y |
     |-------------------------------------------------------------------------------------------------------|
     |                             _var9                  |                          _var10                  |
     |                                                    |                               Y                  |
     +-------------------------------------------------------------------------------------------------------+

     +-------------------------------------------------------------------------------------------------------+
  5. | _var1 | _var2 |    _var3 |     _var4 |        _var5 |        _var6 |           _var7 |          _var8 |
     |     M |    56 |          |         Y |            Y |              |                 |                |
     |-------------------------------------------------------------------------------------------------------|
     |                             _var9                  |                          _var10                  |
     |                                                    |                               Y                  |
     +-------------------------------------------------------------------------------------------------------+

. drop in 1
(1 observation deleted)

. foreach v of var _var* { 
  2.         label var `v' "`= `v'[1]'" 
  3. }

. drop in 1
(1 observation deleted)

. destring, replace 
_var1 contains non-numeric characters; no replace
_var2 has all characters numeric; replaced as byte
_var3 has all characters numeric; replaced as byte
(3 missing values generated)
_var4 contains non-numeric characters; no replace
_var5 contains non-numeric characters; no replace
_var6 has all characters numeric; replaced as byte
(3 missing values generated)
_var7 has all characters numeric; replaced as byte
(3 missing values generated)
_var8 contains non-numeric characters; no replace
_var9 has all characters numeric; replaced as byte
(3 missing values generated)
_var10 contains non-numeric characters; no replace

. d

Contains data from spiedel.dta
  obs:             3                          
 vars:            10                          4 May 2006 20:39
 size:           183 (99.9% of memory free)
-------------------------------------------------------------------------------
              storage  display     value
variable name   type   format      label      variable label
-------------------------------------------------------------------------------
_var1           str3   %9s                    Sex
_var2           byte   %10.0g                 Age
_var3           byte   %10.0g                 Left eye
_var4           str9   %9s                    Right eye
_var5           str12  %12s                   Lower eyelid
_var6           byte   %10.0g                 Upper eyelid
_var7           byte   %10.0g                 Lateral canthus
_var8           str14  %14s                   Medial canthus
_var9           byte   %10.0g                 Recurrent lesion
_var10          str14  %14s                   Primary lesion
-------------------------------------------------------------------------------
Sorted by:  
     Note:  dataset has changed since last saved

. l

     +--------------------------------------------------------------------------------+
     | _var1   _var2   _var3   _var4   _var5   _var6   _var7   _var8   _var9   _var10 |
     |--------------------------------------------------------------------------------|
  1. |     M      47       .       Y       Y       .       .       Y       .        Y |
  2. |     M      66       .       Y       Y       .       .       Y       .        Y |
  3. |     M      56       .       Y       Y       .       .               .        Y |
     +--------------------------------------------------------------------------------+


Naturally, I can't see what problems lurk in the rest of the data, but so 
far, so good. 

Nick 
n.j.cox@durham.ac.uk 

Radu Ban
 
> i don't know if this is the best way, but you could use the 
> -xpose- command.
> 
> but before that you need to make all you variables numerical. 
> for example:
> 
> /*coding males as 1 females as 0, Y as 1 no(missing) as 0 */
> /*coding age as numeric age*/
> 
> forval i = 2/256 {
> gen real_v`i' = 1 if v`i' == "M" | v`i' == "Y"
> replace real_v`i' = 0 if v`i' == "F" | v`i' == ""
> replace real_v`i' = real(v`i') in 2
> }
> 
> /*now you no longer really need id and v1, as long as you make note of
> what each row means*/
> 
> drop id v1
> 
> /*now you can transpose*/
> xpose, clear
> 
> /*now you rename to get the variables name back*/
> rename v1 sex
> rename v2 age
> ...
> 
> /*now you label values*/
> label define yesno 1 yes 2 no
> label define gender 1 male 0 female
> ...
> 
> hope this helps. there are probably more elegant solutions though.
 
Thomas Speidel 

> > I have a dataset that I need to re-organize in a long format.
> > Here is a sample:
> >
> > +--------------------------------------+
> > | id                 v1   v2   v3   v4 |
> > |--------------------------------------|
> > |  2                Sex    M    M    M |
> > |  3                Age   47   66   56 |
> > |  4           Left eye                |
> > |  5          Right eye    Y    Y    Y |
> > |  6       Lower eyelid    Y    Y    Y |
> > |--------------------------------------|
> > |  7       Upper eyelid                |
> > |  8    Lateral canthus                |
> > |  9     Medial canthus    Y    Y      |
> > | 10   Recurrent lesion                |
> > | 11     Primary lesion    Y    Y    Y |
> > +--------------------------------------+
> >
> > There are 255 observations (v2-v256) occupying the columns. 
>  I need each
> > observation to be a distinct row.  Almost all of the 
> entries are string,
> > (missing means "NO"). I am having difficulties using the 
> reshape command
> > to achieve my goal, possibly because of the strings.  Any 
> suggestion on
> > how to approach this?  Should I create a loop to encode all 
> variables?

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index