Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Problem with -reshape- and value labels


From   "Newson, Roger B" <[email protected]>
To   <[email protected]>
Subject   st: RE: Problem with -reshape- and value labels
Date   Wed, 11 Jun 2008 19:48:34 +0100

A possible solution might involve using the descsave package
(downloadable from SSC using the ssc command) to save the specifications
of variable attributes (including value labels) in a do-file before the
first of your reshape commands, and to execute this do-file after the
last of your reshape commands. Before the first of your reshape
commands, you might type

tempfile df0
descsave resp*, do(`"`df0'"', replace)

to create a do-file in the temporary file specified by `"`df0'"'. Then,
after the last of your reshape commands, you might type

run `"`df0'"'

and the variables resp1-resp6 will have the variable labels, formats,
value labels and storage types that they had in the original dataset,
following the execution of this do-file.

I hope this helps.

Roger


Roger B Newson
Lecturer in Medical Statistics
Respiratory Epidemiology and Public Health Group
National Heart and Lung Institute
Imperial College London
Royal Brompton Campus
Room 33, Emmanuel Kaye Building
1B Manresa Road
London SW3 6LR
UNITED KINGDOM
Tel: +44 (0)20 7352 8121 ext 3381
Fax: +44 (0)20 7351 8322
Email: [email protected] 
Web page: www.imperial.ac.uk/nhli/r.newson/
Departmental Web page:
http://www1.imperial.ac.uk/medicine/about/divisions/nhli/respiration/pop
genetics/reph/

Opinions expressed are those of the author, not of the institution.

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Clyde
Schechter
Sent: 11 June 2008 19:13
To: [email protected]
Subject: st: Problem with -reshape- and value labels

I am having a problem whereby I start out with a data set that has a 
number of variables with some different value labels.  They 
variables' names share a common prefix, and when I reshape the data 
to long format, it seems that the value label assigned to the _last_ 
of the variables is carried to the new variable that equals the 
common prefix.  For example:

. des

Contains data
   obs:            10
  vars:             7
  size:           160 (99.9% of memory free)
------------------------------------------------------------------------
-----------------------------------
               storage  display     value
variable name   type   format      label      variable label
------------------------------------------------------------------------
-----------------------------------
seq             int    %8.0g
resp1           byte   %8.0g       boolean    1 resp
resp2           byte   %8.0g       boolean    2 resp
resp3           byte   %8.0g       boolean    3 resp
resp4           byte   %8.0g       boolean    4 resp
resp5           byte   %8.0g       boolean    5 resp
resp6           byte   %8.0g       other      6 resp
------------------------------------------------------------------------
-----------------------------------
Sorted by:  seq

. reshape long resp, i(seq) j(item)
(note: j = 1 2 3 4 5 6)

Data                               wide   ->   long
------------------------------------------------------------------------
-----
Number of obs.                       10   ->      60
Number of variables                   7   ->       3
j variable (6 values)                     ->   item
xij variables:
                   resp1 resp2 ... resp6   ->   resp
------------------------------------------------------------------------
-----

. des

Contains data
   obs:            60
  vars:             3
  size:           720 (99.9% of memory free)
------------------------------------------------------------------------
-----------------------------------
               storage  display     value
variable name   type   format      label      variable label
------------------------------------------------------------------------
-----------------------------------
seq             int    %8.0g
item            byte   %9.0g
resp            byte   %8.0g       other
------------------------------------------------------------------------
-----------------------------------
Sorted by:  seq  item
      Note:  dataset has changed since last saved

But the real problem arises further on:

<snip> do stuff to resp variable

<end snip>

. reshape wide
(note: j = 1 2 3 4 5 6)

Data                               long   ->   wide
------------------------------------------------------------------------
-----
Number of obs.                       60   ->      10
Number of variables                   3   ->       7
j variable (6 values)              item   ->   (dropped)
xij variables:
                                    resp   ->   resp1 resp2 ... resp6
------------------------------------------------------------------------
-----

. des

Contains data
   obs:            10
  vars:             7
  size:           160 (99.9% of memory free)
------------------------------------------------------------------------
-----------------------------------
               storage  display     value
variable name   type   format      label      variable label
------------------------------------------------------------------------
-----------------------------------
seq             int    %8.0g
resp1           byte   %8.0g       other      1 resp
resp2           byte   %8.0g       other      2 resp
resp3           byte   %8.0g       other      3 resp
resp4           byte   %8.0g       other      4 resp
resp5           byte   %8.0g       other      5 resp
resp6           byte   %8.0g       other      6 resp
------------------------------------------------------------------------
-----------------------------------
Sorted by:  seq

Notice now that the value label "other" has been spread on to all of 
the variables resp1-resp5 that originally had value label "boolean."

This then raises problems because I later attempt to select a group 
of variables for some further analyses with:

ds, has(vallabel boolean)

which now comes up empty.

I can't get around this by just moving the resp6 variable earlier in 
the data set: its unique value label gets singled out for the 
long-format prefix-named variable regardless of where it physically 
is in the data set.  In fact, the work around seems to be to rename 
one of the "boolean" labeled variables to have a name that is 
alphabetically last.

That would keep the "boolean" label from getting wiped out, but then 
it results in all the variables being so labeled when I reshape back 
to wide, so the -ds- command then traps variables that should be 
excluded from further analysis.  Is there anyway to have -reshape- 
restore the original labels?

(Evidently I can just relabel them by hand in this example, but the 
real data set I'm working with has several dozen such variables, so 
this starts to get impractical.)

I checked the -reshape- section of the manual and I find no mention 
of anything about how value labels are handled.

Any help would be appreciated.  Thanks in advance.

Clyde Schechter
Albert Einstein College of Medicine
Bronx, New York, USA

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index