Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: Grouping income variables- RECODE COMMAND

 From Nick Cox To "statalist@hsphsun2.harvard.edu" Subject Re: st: Grouping income variables- RECODE COMMAND Date Sun, 2 Feb 2014 08:58:56 +0000

```Your -recode- mapped 1,...,11 to 1,...,11, which makes precisely no
progress with the main problem. As I understand what you want, you
need something more like

recode hinctnt 1=40 2=70 3=130 ...

Nick
njcoxstata@gmail.com

On 1 February 2014 19:43, Antonio Rodriguez Andres
<Antonio.Andres@emu.edu.tr> wrote:
> Nıck
>
> You are right. But ıf I type the following code
>
> recode hinctnt (1=1 "1st interval") (2=2 "2nd interval") (3=3 "3rd interval") (4=4 "4th interval") (5=5 "5th interval") (6=6 "6th interval") (7=7 "7th interval") (8=8 "8th interval") (9=9 "9th interval") (10=10 "10th interval") (11=11 "11th interval") (12=12 "12th interval") (.=.m "Missing") (77=.r "Refusal") (88=.d "Don't Know") (99=.s "Not answer"), gen (ihinctnt)
>
> I generate a new variable ihinctnt. Then I tabulated and I compute summary statistics. But these are not incomes. I should specify the upper and lower linıt for each interval. How can I do it
>
>
> tab ihinctnt, missing
>
> RECODE of
> hinctnt
> (Household's
> total net
> income, all
> sources)       Freq.     Percent        Cum.
>
> 1st interval       1,663        3.87        3.87
> 2nd interval       1,561        3.63        7.50
> 3rd interval       2,262        5.26       12.76
> 4th interval       3,676        8.55       21.31
> 5th interval       3,545        8.24       29.55
> 6th interval       3,293        7.66       37.21
> 7th interval       3,010        7.00       44.21
> 8th interval       2,871        6.68       50.89
> 9th interval       4,707       10.95       61.83
> 10th interval       2,058        4.79       66.62
> 11th interval         644        1.50       68.12
> 12th interval         428        1.00       69.11
> Don't Know       3,540        8.23       77.34
> Missing       5,037       11.71       89.06
> Refusal       4,525       10.52       99.58
> Not answer         180        0.42      100.00
>
> Total      43,000      100.00
>
> . summ ihinctnt
>
> Variable        Obs        Mean    Std. Dev.       Min  Max
>
> ihinctnt      29718    6.156504     2.75604          1  12
>
> . summ ihinctnt,d
>
> RECODE of hinctnt (Household's total net income,
> all sources)
>
> Percentiles      Smallest
> 1%            1              1
> 5%            1              1
> 10%            2              1       Obs               29718
> 25%            4              1       Sum of Wgt.       29718
>
> 50%            6                      Mean           6.156504
> Largest       Std. Dev.       2.75604
> 75%            9             12
> 90%           10             12       Variance       7.595757
> 95%           10             12       Skewness       -.080652
> 99%           12             12       Kurtosis       2.098037
> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox
> Sent: Saturday, February 01, 2014 9:17 PM
> To: statalist@hsphsun2.harvard.edu
> Subject: Re: st: Grouping income variables- RECODE COMMAND
>
> The numeric values of -hinctnt- don't exceed 99. They are evidently numeric codes, not incomes. So, why you are surprised at your results?
> You have to -recode- your data before you can classify them. And that means the -recode- command.
> Nick
> njcoxstata@gmail.com
>
>
> On 1 February 2014 18:14, Antonio Rodriguez Andres <Antonio.Andres@emu.edu.tr> wrote:
>> Here you can see the basic description of the income variable
>>
>> tab hinctnt
>>
>> Household's |
>>   total net |
>> income, all |
>>     sources |      Freq.     Percent        Cum.
>> ------------+-----------------------------------
>>           J |      1,663        4.38        4.38
>>           R |      1,561        4.11        8.49
>>           C |      2,262        5.96       14.45
>>           M |      3,676        9.68       24.13
>>           F |      3,545        9.34       33.47
>>           S |      3,293        8.67       42.15
>>           K |      3,010        7.93       50.08
>>           P |      2,871        7.56       57.64
>>           D |      4,707       12.40       70.04
>>           H |      2,058        5.42       75.46
>>           U |        644        1.70       77.15
>>           N |        428        1.13       78.28
>>     Refusal |      4,525       11.92       90.20
>>  Don't know |      3,540        9.32       99.53
>>   No answer |        180        0.47      100.00
>> ------------+-----------------------------------
>>       Total |     37,963      100.00
>>
>>
>> sum hinctnt, d
>>
>>           Household's total net income, all sources
>> -------------------------------------------------------------
>>       Percentiles      Smallest
>>  1%            1              1
>>  5%            2              1
>> 10%            3              1       Obs               37963
>> 25%            5              1       Sum of Wgt.       37963
>>
>> 50%            7                      Mean           22.67271
>>                         Largest       Std. Dev.      31.57352
>> 75%           10             99
>> 90%           77             99       Variance       996.8872
>> 95%           88             99       Skewness       1.378759
>> 99%           88             99       Kurtosis       2.984444
>>
>> .
>>
>> -----Original Message-----
>> From: owner-statalist@hsphsun2.harvard.edu
>> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox
>> Sent: Saturday, February 01, 2014 7:52 PM
>> To: statalist@hsphsun2.harvard.edu
>> Subject: Re: st: Grouping income variables- RECODE COMMAND
>>
>> Your code shows you using the -recode()- function, which is quite different from the -recode- command. In Stata functions and commands are different!
>>
>> I think that to comment helpfully we need to see more about your
>> -hinctnt-, for example, the results of
>>
>> . su hinctnt, detail
>>
>> Your categories are not disjoint as (e.g.) the definitions [70, 120] and [120, 230] leave ambiguous what happens with 120. Alternatively, your notation here confuses the meaning of [ ] and ( ).
>> Nick
>> njcoxstata@gmail.com
>>
>>
>> On 1 February 2014 17:29, Antonio Rodriguez Andres <Antonio.Andres@emu.edu.tr> wrote:
>>> Dear Stata users,
>>>
>>> I have to group the income variable in different intervals. In the
>>> original dataset, the household income variable is grouped İnto 12
>>> categories
>>>
>>> J <40
>>> R [40,70]
>>> C [70, 120]
>>> M [120, 230]
>>> F [230, 350]
>>> S
>>> K
>>> P
>>> D
>>>  H
>>>  U [1730, 2310)
>>> N > 2310
>>>
>>> I want to group J and R categories <70 Euros, and create dummy
>>> variables for all income groups. That is the Stata ouput. I used the
>>> recode command But it does not work
>>>
>>> gen hinc_gr=recode(hinctnt, 70, 120, 230, 350, 460, 580, 690, 1150,
>>> 1730,
>>> 2310)
>>> (13282 missing values generated)
>>>
>>> . tab hinc_gr
>>>
>>>     hinc_gr |      Freq.     Percent        Cum.
>>> ------------+-----------------------------------
>>>          70 |     29,718      100.00      100.00
>>> ------------+-----------------------------------
>>>       Total |     29,718      100.00
>>>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```