Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: Re: Data management, years of schooling


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: Re: Data management, years of schooling
Date   Wed, 4 Feb 2009 17:06:17 -0000

Svend Juul also made a suggestion. The code will be a little messy
however this is done. Yet another possibility is 

gen education_yrs = 
		cond(hi_edu > 30, 11 + mod(hi_edu, 10), 
		cond(hi_edu > 20, 7 + mod(hi_edhu, 10), 
			mod(hi_edu, 10))) 

That is one line, syntactically. It may be parsed using this pseudocode
(which quite accidentally is rather Mata-like) 

	if (hi_edu > 30) y = 11 + mod(hi_edu, 10) 
	else if (hi_edu > 20) y = 7 + mod(hi_edu, 10) 
	else y = mod(hi_edu, 10) 

People can agree to disagree here, as the differences are ones of style.
I like all the solutions I've seen, ugly ducklings one all. 

Nick 
n.j.cox@durham.ac.uk 

Joseph Coveney

There are probably better ways, but something like that below should do
it.
(Note that I'd normally prefer something more like

generate byte education _yrs = mod(hi_edu, 10) + ///
  7 * inrange(hi_edu, 21, 24) + ///
  11 * inrange(hi_edu, 31, 35)

because it would be easier to maintain--more self-documenting--but
there's an
outside chance that it is somewhat slower in execution, perhaps even
noticeably so if you've got a very large amount of data.)

. clear *

. set more off

. input hhid hi_educ years

          hhid    hi_educ      years
  1. 1       11      1
  2. 2       21      8
  3. 3       17      7
  4. 4       16      6
  5. 5       24      11
  6. 6       31      12
  7. 7       32      13
  8. 8       13      3
  9. 9       22      9
 10. end

. generate byte education_yrs = mod(hi_educ, 10) + ///
>   7 * floor(hi_educ / 20) + ///
>   4 * floor(hi_educ / 30)

. list, noobs separator(0)

  +-----------------------------------+
  | hhid   hi_educ   years   educat~s |
  |-----------------------------------|
  |    1        11       1          1 |
  |    2        21       8          8 |
  |    3        17       7          7 |
  |    4        16       6          6 |
  |    5        24      11         11 |
  |    6        31      12         12 |
  |    7        32      13         13 |
  |    8        13       3          3 |
  |    9        22       9          9 |
  +-----------------------------------+

. exit


Ronnie Babigumira wrote:


> I have an interesting data management problem. My data look like this
[see below]
> Where hi_educ is the highest level of education for household. From
this I
> would like to extract the number of years of schooling.
>
> Now, for values below 17, the years of schooling is the last digit
> for values between 21 and 24, it is 7 + the last digit
> for values between 31 and 35 it is 11 + the last digit
>
> What I would like to end up with is something like this
>
> hhid hi_educ years
> 1 11 1
> 2 21 8
> 3 17 7
> 4 16 6
> 5 24 11
> 6 31 12
> 7 32 13
> 8 13 3
> 9 22 9
>
> I am stuck here
> gen str3 test = ""
> replace test  = substr(string(hi_educ), -1,.) if
inrange(hi_educ,11,17)

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index