Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: merging variables with labels


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: merging variables with labels
Date   Thu, 13 Jan 2011 10:10:13 +0000

Another answer to Ben's longer question is to check out -multencode-
from SSC, which can -encode- several string variables together using a
common set of value labels.

There is a much more direct answer to his shorter question. See

SJ-4-4  dm0009  . . . . . . .  Stata tip 14: Using value labels in expressions
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  K. Higbee
        Q4/04   SJ 4(4):488--489                                 (no commands)
        tips for using value labels in expressions

http://www.stata-journal.com/sjpdf.html?articlenum=dm0009

Nick

On Thu, Jan 13, 2011 at 6:15 AM, Sergiy Radyakin <serjradyakin@gmail.com> wrote:

> Ben describes a situation where two variables are obtained as a result of
> the encoding corresponding string variables. Stata automatically generated
> two systems of values labeling, which both overlap in their codes, even
> though the labels are different.
>
> When values of either of these two variables are put into the third one,
> neither of the labeling system applies, and the situation really reminds the
> classical "apples and oranges" basket. I will proceed under the assumption
> that the mixture problem is ruled out and the resulting variable makes sense.
>
> Stata's language is used in the following code, Mata can be more convenient
> and efficient.
>
> Here is an example:
>
> . clear
> . input str30 xs str30 ys price
>                                 xs                              ys      price
>  1.         "North" "Red" 50
>  2.         "South" "Green" 30
>  3.         "East" "Blue" 48
>  4.         "West" "Yellow" 24
>  5.         "North" "Green" 36
>  6.         "North" "Blue" 85
>  7.         "South" "Red" 75
>  8. end
> . encode xs, generate(x)
> . encode ys, generate(y)
> . generate zs = cond(price>45,"x","y")
> . generate z = cond(price>45,x,y)
> . list, clean
>          xs       ys   price       x        y   zs   z
>  1.   North      Red      50   North      Red    x   2
>  2.   South    Green      30   South    Green    y   2
>  3.    East     Blue      48    East     Blue    x   1
>  4.    West   Yellow      24    West   Yellow    y   4
>  5.   North    Green      36   North    Green    y   2
>  6.   North     Blue      85   North     Blue    x   2
>  7.   South      Red      75   South      Red    x   3
>
> Here zs shows where the value is coming from and z the value of the
> new variable itself. When the value is coming from variable x the label
> from the x's labeling system must be used, and similarly for y.
>
> The problem stems from the fact that the two labeling systems were
> created independently from one another, and thus may not be used
> in such an operation.
>
> If the original string data is still available, Ben may change the above
> program to make the two encodings dependent on each other, by
> forcing Stata to use the same labeling system for both of them.
> The encode command provides a convenient option -label- exactly
> for this purpose.
>
> . clear
> . input str30 xs str30 ys price
>                                 xs                              ys      price
>  1.         "North" "Red" 50
>  2.         "South" "Green" 30
>  3.         "East" "Blue" 48
>  4.         "West" "Yellow" 24
>  5.         "North" "Green" 36
>  6.         "North" "Blue" 85
>  7.         "South" "Red" 75
>  8. end
> . encode xs, generate(x)
> . encode ys, generate(y) label(x)
> . generate zs = cond(price>45,"x","y")
> . generate z = cond(price>45,x,y)
> . label values z x
> . list, clean
>          xs       ys   price       x        y   zs        z
>  1.   North      Red      50   North      Red    x    North
>  2.   South    Green      30   South    Green    y    Green
>  3.    East     Blue      48    East     Blue    x     East
>  4.    West   Yellow      24    West   Yellow    y   Yellow
>  5.   North    Green      36   North    Green    y    Green
>  6.   North     Blue      85   North     Blue    x    North
>  7.   South      Red      75   South      Red    x    South
>
>
> See that in the z variable the directions names are used when the
> values are taken from the variable x and colors names are used when
> the values are taken from the variable y.
>
> We further confirm that labels are applicable to this variable:
> . label list
> x:
>           1 East
>           2 North
>           3 South
>           4 West
>           5 Blue
>           6 Green
>           7 Red
>           8 Yellow
>
> If the original string information is not available anymore, Ben can
> adjust the values and the labels of one variable in such a way that they
> do not overlap with values of the other variable.
>
> For values, often adding a constant, e.g. 10000 is sufficient.
> It's slightly more difficult with the labeling system. But here is an example:
>
> . clear
> . input str30 xs str30 ys price
>                                 xs                              ys      price
>  1.         "North" "Red" 50
>  2.         "South" "Green" 30
>  3.         "East" "Blue" 48
>  4.         "West" "Yellow" 24
>  5.         "North" "Green" 36
>  6.         "North" "Blue" 85
>  7.         "South" "Red" 75
>  8. end
> . encode xs, generate(x)
> . encode ys, generate(y)
> . list, clean
>          xs       ys   price       x        y
>  1.   North      Red      50   North      Red
>  2.   South    Green      30   South    Green
>  3.    East     Blue      48    East     Blue
>  4.    West   Yellow      24    West   Yellow
>  5.   North    Green      36   North    Green
>  6.   North     Blue      85   North     Blue
>  7.   South      Red      75   South      Red
> . preserve
> . uselabel x y, clear
> . assert trunc==0
> . sum value if lname=="x"
>    Variable |       Obs        Mean    Std. Dev.       Min        Max
> -------------+--------------------------------------------------------
>       value |         4         2.5    1.290994          1          4
> . local shift = r(max)+1
> . replace value=value+`shift' if lname=="y"
> (4 real changes made)
> . drop lname trunc
> . forvalues i=1/`=_N' {
>  2.   label define zlab `=value[`i']' `"`=label[`i']'"', modify
>  3. }
> . tempfile lbl
> . label save zlab using `lbl'
> file C:\Users\Username\AppData\Local\Temp\ST_0000000q.tmp saved
> . restore
> . replace y=y+`shift'
> (7 real changes made)
> . label drop y // old labels not applicable anymore
> . do `lbl'
> . label define zlab 1 `"East"', modify
> . label define zlab 2 `"North"', modify
> . label define zlab 3 `"South"', modify
> . label define zlab 4 `"West"', modify
> . label define zlab 6 `"Blue"', modify
> . label define zlab 7 `"Green"', modify
> . label define zlab 8 `"Red"', modify
> . label define zlab 9 `"Yellow"', modify
> end of do-file
> . generate zs = cond(price>45,"x","y")
> . generate z = cond(price>45,x,y)
> . label values z zlab
> . list, clean
>          xs       ys   price       x   y   zs        z
>  1.   North      Red      50   North   8    x    North
>  2.   South    Green      30   South   7    y    Green
>  3.    East     Blue      48    East   6    x     East
>  4.    West   Yellow      24    West   9    y   Yellow
>  5.   North    Green      36   North   7    y    Green
>  6.   North     Blue      85   North   6    x    North
>  7.   South      Red      75   South   8    x    South
> . label list
> zlab:
>           1 East
>           2 North
>           3 South
>           4 West
>           6 Blue
>           7 Green
>           8 Red
>           9 Yellow
> x:
>           1 East
>           2 North
>           3 South
>           4 West
> end of do-file
>
> Note that the resulting listing is the same as with the first case.
> As always with Stata, there are other (and more efficient) ways of
> getting the same result.
>
> Finally to answer your question regarding obtaining the value based on
> the label.
>
> If you know the code, the label is obtained instantly:
> . sysuse auto
> . di `"`:label origin 1'"'
> Foreign
> . di `"`:label (foreign) 1'"'
> Foreign
>
> See help for extended macro functions.
>
> If you need an inverse transformation, i.e. obtaining the values by
> labels, you may do:
> . sysuse auto
> . uselabel origin
> . sum value if label=="Foreign", meanonly
> . di r(mean)
> 1
>
> If you don't want to destroy the data, then loop through all labeled
> values, and use
> `:label labelname labeledvalue' to check if it matches the label you
> are looking for.
>
> I would personally prefer Mata, which simplifies the matter a lot:
>
> .mata st_local("v",strofreal(st_vlsearch("origin", "Foreign")))
> .display `v'
> 1
>
> Hope this helps.

> On Wed, Jan 12, 2011 at 8:24 PM, Ben Hoen <bhoen@lbl.gov> wrote:

>> I have two variables (e.g. x and y) with value labels (e.g. xlab and ylab).
>> Both x and y were created via .encode (they were derived from string
>> variables) so they overlap in their underlying values but the value labels
>> are unique between x and y and within x and y.
>>
>> I would like to create a new variable z which would take on the value labels
>> of x (xlab) for some cases and y (ylab) for other cases based on an
>> expression.  I do not care what the underlying values are, only that the
>> value labels are correctly assigned?
>>
>> Suggestions?
>>
>> Somewhat relatedly, is there a quick way on the command line to display the
>> underlying value of a variable if one only has the label?  I imagine
>> something like: .display y if ylab="something"
>>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index