Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Problem with -reshape- and value labels


From   Clyde Schechter <cschecht@aecom.yu.edu>
To   statalist@hsphsun2.harvard.edu
Subject   st: Problem with -reshape- and value labels
Date   Wed, 11 Jun 2008 14:12:52 -0400

I am having a problem whereby I start out with a data set that has a number of variables with some different value labels. They variables' names share a common prefix, and when I reshape the data to long format, it seems that the value label assigned to the _last_ of the variables is carried to the new variable that equals the common prefix. For example:

. des

Contains data
obs: 10
vars: 7
size: 160 (99.9% of memory free)
-----------------------------------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-----------------------------------------------------------------------------------------------------------
seq int %8.0g
resp1 byte %8.0g boolean 1 resp
resp2 byte %8.0g boolean 2 resp
resp3 byte %8.0g boolean 3 resp
resp4 byte %8.0g boolean 4 resp
resp5 byte %8.0g boolean 5 resp
resp6 byte %8.0g other 6 resp
-----------------------------------------------------------------------------------------------------------
Sorted by: seq

. reshape long resp, i(seq) j(item)
(note: j = 1 2 3 4 5 6)

Data wide -> long
-----------------------------------------------------------------------------
Number of obs. 10 -> 60
Number of variables 7 -> 3
j variable (6 values) -> item
xij variables:
resp1 resp2 ... resp6 -> resp
-----------------------------------------------------------------------------

. des

Contains data
obs: 60
vars: 3
size: 720 (99.9% of memory free)
-----------------------------------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-----------------------------------------------------------------------------------------------------------
seq int %8.0g
item byte %9.0g
resp byte %8.0g other
-----------------------------------------------------------------------------------------------------------
Sorted by: seq item
Note: dataset has changed since last saved

But the real problem arises further on:

<snip> do stuff to resp variable

<end snip>

. reshape wide
(note: j = 1 2 3 4 5 6)

Data long -> wide
-----------------------------------------------------------------------------
Number of obs. 60 -> 10
Number of variables 3 -> 7
j variable (6 values) item -> (dropped)
xij variables:
resp -> resp1 resp2 ... resp6
-----------------------------------------------------------------------------

. des

Contains data
obs: 10
vars: 7
size: 160 (99.9% of memory free)
-----------------------------------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-----------------------------------------------------------------------------------------------------------
seq int %8.0g
resp1 byte %8.0g other 1 resp
resp2 byte %8.0g other 2 resp
resp3 byte %8.0g other 3 resp
resp4 byte %8.0g other 4 resp
resp5 byte %8.0g other 5 resp
resp6 byte %8.0g other 6 resp
-----------------------------------------------------------------------------------------------------------
Sorted by: seq

Notice now that the value label "other" has been spread on to all of the variables resp1-resp5 that originally had value label "boolean."

This then raises problems because I later attempt to select a group of variables for some further analyses with:

ds, has(vallabel boolean)

which now comes up empty.

I can't get around this by just moving the resp6 variable earlier in the data set: its unique value label gets singled out for the long-format prefix-named variable regardless of where it physically is in the data set. In fact, the work around seems to be to rename one of the "boolean" labeled variables to have a name that is alphabetically last.

That would keep the "boolean" label from getting wiped out, but then it results in all the variables being so labeled when I reshape back to wide, so the -ds- command then traps variables that should be excluded from further analysis. Is there anyway to have -reshape- restore the original labels?

(Evidently I can just relabel them by hand in this example, but the real data set I'm working with has several dozen such variables, so this starts to get impractical.)

I checked the -reshape- section of the manual and I find no mention of anything about how value labels are handled.

Any help would be appreciated. Thanks in advance.

Clyde Schechter
Albert Einstein College of Medicine
Bronx, New York, USA

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index