Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Re: Data management, years of schooling


From   "Joseph Coveney" <[email protected]>
To   "Statalist" <[email protected]>
Subject   st: Re: Data management, years of schooling
Date   Wed, 4 Feb 2009 12:55:40 +0900

There are probably better ways, but something like that below should do it.
(Note that I'd normally prefer something more like

generate byte education _yrs = mod(hi_edu, 10) + ///
 7 * inrange(hi_edu, 21, 24) + ///
 11 * inrange(hi_edu, 31, 35)

because it would be easier to maintain--more self-documenting--but there's an
outside chance that it is somewhat slower in execution, perhaps even
noticeably so if you've got a very large amount of data.)

Joseph Coveney

. clear *

. set more off

. input hhid hi_educ years

         hhid    hi_educ      years
 1. 1       11      1
 2. 2       21      8
 3. 3       17      7
 4. 4       16      6
 5. 5       24      11
 6. 6       31      12
 7. 7       32      13
 8. 8       13      3
 9. 9       22      9
10. end

. generate byte education_yrs = mod(hi_educ, 10) + ///
  7 * floor(hi_educ / 20) + ///
  4 * floor(hi_educ / 30)

. list, noobs separator(0)

 +-----------------------------------+
 | hhid   hi_educ   years   educat~s |
 |-----------------------------------|
 |    1        11       1          1 |
 |    2        21       8          8 |
 |    3        17       7          7 |
 |    4        16       6          6 |
 |    5        24      11         11 |
 |    6        31      12         12 |
 |    7        32      13         13 |
 |    8        13       3          3 |
 |    9        22       9          9 |
 +-----------------------------------+

. exit


Ronnie Babigumira wrote:


I have an interesting data management problem. My data look like this
[see below]
Where hi_educ is the highest level of education for household. From this I
would like to extract the number of years of schooling.

Now, for values below 17, the years of schooling is the last digit
for values between 21 and 24, it is 7 + the last digit
for values between 31 and 35 it is 11 + the last digit

What I would like to end up with is something like this

hhid hi_educ years
1 11 1
2 21 8
3 17 7
4 16 6
5 24 11
6 31 12
7 32 13
8 13 3
9 22 9

I am stuck here
gen str3 test = ""
replace test  = substr(string(hi_educ), -1,.) if inrange(hi_educ,11,17)

I would appreciate any help


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index