Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# st: RE: using information from value label to generate new variables

 From Nick Cox To Nick Cox , "'statalist@hsphsun2.harvard.edu'" Subject st: RE: using information from value label to generate new variables Date Thu, 7 Jun 2012 19:12:46 +0100

```Sorry; please ignore this. It's based on reading only part of Evelyn's question. I will send a better answer soon.

Nick
n.j.cox@durham.ac.uk

-----Original Message-----
From: Nick Cox
Sent: 07 June 2012 19:12
To: 'statalist@hsphsun2.harvard.edu'
Subject: RE: using information from value label to generate new variables

My main reaction is that the -egen- function -anymatch()- is there to create these indicator variables.

This example creates a sandpit for me to play in and shows what I mean.

clear
set obs 10
gen id = _n
label def country 1 Albania 2 Belgium 3 "Czech Republic" 4 Denmark 5 Estonia 6 Finland 7 Greece 8 Haiti 9 Iceland 10 Japan
forval j = 1/5 {
gen v`j' = ceil(10 * runiform())
label val v`j' country
}

l

forval k = 1/10 {
egen is`k' = anymatch(v*), val(`k')
label var is`k' "`: label country `k''"
}

d is*
l is*

This is what one run looks like (yours may differ because of different random numbers):

clear

. set obs 10
obs was 0, now 10

. gen id = _n

. label def country 1 Albania 2 Belgium 3 "Czech Republic" 4 Denmark 5 Estonia 6 Finland 7 Greece 8 Haiti 9 Iceland 10 Japan

. forval j = 1/5 {
2.         gen v`j' = ceil(10 * runiform())
3.         label val v`j' country
4. }

.
. l

+-------------------------------------------------------------+
| id               v1        v2        v3        v4        v5 |
|-------------------------------------------------------------|
1. |  1            Haiti   Finland    Greece     Haiti   Denmark |
2. |  2           Greece    Greece    Greece     Haiti   Finland |
3. |  3          Denmark   Belgium   Estonia   Finland    Greece |
4. |  4          Albania   Belgium   Belgium   Denmark    Greece |
5. |  5          Iceland    Greece   Finland   Belgium   Finland |
|-------------------------------------------------------------|
6. |  6           Greece   Finland     Japan   Finland   Estonia |
7. |  7   Czech Republic   Iceland   Iceland   Albania     Japan |
8. |  8           Greece   Albania    Greece   Denmark     Japan |
9. |  9          Denmark   Estonia   Albania   Estonia     Japan |
10. | 10          Iceland   Iceland   Finland   Iceland     Japan |
+-------------------------------------------------------------+

.
. forval k = 1/10 {
2.         egen is`k' = anymatch(v*), val(`k')
3.         label var is`k' "`: label country `k''"
4. }

.
. d is*

storage  display     value
variable name   type   format      label      variable label
---------------------------------------------------------------------------------------------------------------------------------------------
is1             byte   %8.0g                  Albania
is2             byte   %8.0g                  Belgium
is3             byte   %8.0g                  Czech Republic
is4             byte   %8.0g                  Denmark
is5             byte   %8.0g                  Estonia
is6             byte   %8.0g                  Finland
is7             byte   %8.0g                  Greece
is8             byte   %8.0g                  Haiti
is9             byte   %8.0g                  Iceland
is10            byte   %8.0g                  Japan

. l is*

+------------------------------------------------------------+
| is1   is2   is3   is4   is5   is6   is7   is8   is9   is10 |
|------------------------------------------------------------|
1. |   0     0     0     1     0     1     1     1     0      0 |
2. |   0     0     0     0     0     1     1     1     0      0 |
3. |   0     1     0     1     1     1     1     0     0      0 |
4. |   1     1     0     1     0     0     1     0     0      0 |
5. |   0     1     0     0     0     1     1     0     1      0 |
|------------------------------------------------------------|
6. |   0     0     0     0     1     1     1     0     0      1 |
7. |   1     0     1     0     0     0     0     0     1      1 |
8. |   1     0     0     1     0     0     1     0     0      1 |
9. |   1     0     0     1     1     0     0     0     0      1 |
10. |   0     0     0     0     0     1     0     0     1      1 |
+------------------------------------------------------------+

Nick
n.j.cox@durham.ac.uk

Evelyn Ersanilli

I have a cross-sectional survey dataset.
For question "a", people were asked to list up to 9 countries.
All variables a1-a9 (numeric, up to 3 digits) have the same value label; "locations".
Because it also attached to other variables, the value label locations does not only hold the 3 digit country codes, but also 5-digit regional codes.

For each country (eg France, Germany, Zimbabwe, etc) that was mentioned I would like to generate a variable that is  0 if that country has not been named as any of the 9 replies (many people gave fewer than 9 replies) by a respondent, and 1 if the country has been named as any of the up to 9 replies by a respondent (and missing if the respondent didn't answer a1-a9).
These variables should have the names of the country.

Building on online examples I've gotten close to what I want, but I have problem correctly & efficiently delimiting the list of newly generated variables.
I first tried to get the values and labels from the first answer (a1). However this risks omitting countries that have only been named in a2,a3 etc.
In my second attempt I therefore tried to abstracts the values and labels from the value label 'locations' using labellist and r()
The problem with Attempt 2 is that r() only saves up to (244?) characters, which is fewer that all values together and I haven't found out how to increase the storage capacity.
Ideally I would also limit the abstraction of lables&values to only the 1-3digit country codes., leaving out the 5-digit regional codes.

Any alternative suggestions  would be welcome

Here is my syntax:

*-------------------Attempt I-----------
//Step 1: abstract labels
levelsof a1, local(a1_levels)
foreach val of local a1_levels {
local c`val' : label locations `val'
}
macro list

//Step 2: generate dummies
foreach X of local a1_levels {
egen var`X'=anymatch(a1 a2 a3 a4 a5 a6 a7 a8 a9), values(`X')
}

//Step 3: label and rename
local variablelist "var"
foreach variable of local variablelist{
foreach value of local a1_levels{
label variable `variable'`value' "`c`value''"
local stringy =strtoname("`c`value''")		//needed because some country names contain spaces or other illegitimate characters
rename `variable'`value' `stringy'
}
}
*-----------------------------------------

*-------------------Attempt II-----------
//Step 1: abstract labels
quietly: labellist locations
local loc_levels= r(locations_values)
foreach val of local loc_levels {   /* loop over all values in local list `var'_levels */
local c`val' : label locations `val'  /* create macro that contains label for each value */
}
macro list
//etc
*-----------------------------------------

For step 2 I've also tried:
*-----------------------------------------
foreach X of numlist 2/935 {
egen var`X'=anymatch(a1 a2 a3 a4 a5 a6 a7 a8 a9), values(`X')
}
*-----------------------------------------
But that generates way too many variables as many of the values between 2 and 935 do not have a country code associated with it.
I could of course just look up all the value that were assigned a label in locations, but where's the fun in that..

Kind regards

Evelyn

Departmental Lecturer
Oxford Department of International Development (QEH)
International Migration Institute
University of Oxford
Oxford OX1 3TB
United Kingdom
Tel: +44 (0)1865 281717

http://www.eumagine.org
http://www.migration.ox.ac.uk

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```