Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: basic question


From   Nick Cox <[email protected]>
To   [email protected]
Subject   Re: st: basic question
Date   Wed, 24 Aug 2011 08:25:19 +0100

The tone of this reply suggests that Nadine is still puzzled about
something, but precisely what is unclear to me.

However, a -float-/-double- difference would only make a difference to
the precision of very large totals. It wouldn't affect whether totals
were returned as 0 or missing when all components were missing. That
depends on the -missing- option of -egen, rowtotal()-.

Nick

On Tue, Aug 23, 2011 at 3:46 AM, Nadine Brooks <[email protected]> wrote:
> First of all: thanks for the help
> Second: I did not know about the rules in the Statalist. I had bought
> a book to make my life esier to deal with this program but Stata has a
> lot tricks (missing is bigger than zero?!). I was in a hurry to solve
> the problem, so I signed the list and sent the email without reading
> all instructions. Sorry, this will not happen again.
>
> The worse part is that now I solve the problem but I did not learned
> what was wrong....
>
> I rename the variables v9532=sal1, v9932=sal2, v1022=sal3. But before
> that I change the type from float to double. Why? I do not know "trial
> and error + despair".
> And after that I used egen sal= rowtotal (sal1 sal2 sal3), missing.
> By the way: egen sal= rowtotal (v9532 v9932 v1022), missing is also working...
>
> Does it has anything to do with float/ double type? Or I mised something?
>
> Nadine
>
>
>
> 2011/8/22 Phil Clayton <[email protected]>:
>> Nadine replied privately off-list; this is generally discouraged in the Statalist FAQ so I am re-posting her message below.
>>
>> I'm a little unclear on what you actually want. If you want people with no income in any of the 3 variables to have a missing value for sal, the easiest option would be to add the -missing- option to the -egen- command:
>> egen sal=rowtotal(v9535 v9982 v1022), missing
>>
>> egen sal=rowtotal(v9535 v9982 v1022) if v9535>0
>> will not do what you want because, in Stata, missing is the highest number you can have. So missing is greater than zero. As an alternative you could try:
>> egen sal=rowtotal(v9535 v9982 v1022) if v9535>0 & v9535<.
>> or
>> egen sal=rowtotal(v9535 v9982 v1022) if v9535>0 & !missing(v9535)
>>
>> Neither of these solutions are as good as -egen sal=rowtotal(v9535 v9982 v1022), missing- because they assume that v9982 and v1022 will definitely be missing if v9535 is missing. In a perfect world your dataset would be clean enough that this would always be true, but in real life this is not always the case so it's safer to assume that there may be income recorded in v9982 and/or v1022 even if v9535 is missing.
>>
>> Incidentally, why not rename the variables something more readable such as salary, income1, income2 and income3?
>>
>> Phil
>>
>> On 23/08/2011, at 11:40 AM, Nadine Brooks wrote:
>>
>>> Thanks Phil and Eric but even with egen I can not solve my problem.
>>>
>>> I am working with a survey data with 410,241 individual from all ages.
>>> Some of them work and other not. Some the variables that i wnat to sum
>>> is:
>>>
>>> v9532: income from main job
>>> v9982: income from secondary job
>>> v1022: income from the third or more jobs
>>>
>>> so only 170,014 indivuduals work, so when I use  egen
>>> sal=rowtotal(v9535 v9982 v1022) I will have people with income equal
>>> zero...
>>>
>>> Take a look:
>>>
>>> sum v9532 v9982 v1022
>>>
>>>   Variable |       Obs        Mean    Std. Dev.       Min        Max
>>> -------------+--------------------------------------------------------
>>>      v9532 |    170014    831.5625    1451.442          3     120000
>>>      v9982 |      8326    686.3957    1179.807          1      48000
>>>      v1022 |       672      957.75    1422.576          8      11000
>>>
>>> egen sal=rowtotal(v9535 v9982 v1022)
>>>
>>> . sum v9532 v9982 v1022 sal
>>>
>>>    Variable |       Obs        Mean    Std. Dev.       Min        Max
>>> -------------+--------------------------------------------------------
>>>       v9532 |    170014    831.5625    1451.442          3     120000
>>>       v9982 |      8326    686.3957    1179.807          1      48000
>>>       v1022 |       672      957.75    1422.576            8      11000
>>>          sal |    410241    15.88779    225.3688          0      48000
>>>
>>> Now I have all the individuals in my survey data with some income,
>>> even zero. But I dont want that.
>>>
>>> After your advice I had tried also:  egen sal=rowtotal(v9535 v9982
>>> v1022) if v9535>0
>>> because who has the 2nd and or 3th job must have the first (main). But
>>> it did not work as well
>>>
>>> Thanks, Nadine
>>
>>
>> On 23/08/2011, at 11:46 AM, Eric Booth wrote:
>>
>>> <>
>>> Nadine:
>>>
>>> You received several, similar answers about your issue.  In addition to all these and the help files for -sum()- and -egen-, take a look at Nick Cox's 2002  article "Speaking Stata: On getting functions to do the work." Stata Journal 2: 411–427. (Free due to SJ's moving pay wall at:  http://www.stata-journal.com/sjpdf.html?articlenum=pr0007 )
>>>
>>> - Eric
>>> On Aug 22, 2011, at 8:24 PM, Nadine Brooks wrote:
>>>
>>>> But it is not what is happening. Take a look:
>>>>
>>>> sum v9532 v9982 v1022
>>>>
>>>>   Variable |       Obs        Mean    Std. Dev.       Min        Max
>>>> -------------+--------------------------------------------------------
>>>>      v9532 |    170014    831.5625    1451.442          3     120000
>>>>      v9982 |      8326    686.3957    1179.807          1      48000
>>>>      v1022 |       672      957.75    1422.576          8      11000
>>>>
>>>> . gen sal= (v9532+v9982+v1022)
>>>> (409603 missing values generated)
>>>>
>>>> . sum v9532 v9982 v1022 sal
>>>>
>>>>   Variable |       Obs        Mean    Std. Dev.       Min        Max
>>>> -------------+--------------------------------------------------------
>>>>      v9532 |    170014    831.5625    1451.442          3     120000
>>>>      v9982 |      8326    686.3957    1179.807          1      48000
>>>>      v1022 |       672      957.75    1422.576          8      11000
>>>>          sal |       638    3999.621    4536.377         68      40000
>>>>
>>>>
>>>> My variables meas:
>>>>
>>>> v9532: income from main job
>>>> v9982: income from secondary job
>>>> v1022: third or more of income jobs
>>>>
>>>> But most of the people have only one job, so they get missing to v9982
>>>> and v1022...
>>>>
>>>> Thanks, Nadine
>>>>
>>>>
>>>> 2011/8/22 Daniel Marcelino <[email protected]>:
>>>>> Well, I don't know exactly what yours variables means.
>>>>> If you have numeric values, you result should be:
>>>>> v1 v2 v3  sal
>>>>> 1   2   3    6
>>>>> 1   .    1    2
>>>>>
>>>>>
>>>>> Daniel
>>>>>
>>>>> On Mon, Aug 22, 2011 at 9:11 PM, Nadine Brooks <[email protected]> wrote:
>>>>>> By there are missing values particularlly in v9102 and v1022, so I
>>>>>> think that I can not use the operator +, can I?
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2011/8/22 Daniel Marcelino <[email protected]>:
>>>>>>> try
>>>>>>>
>>>>>>> gen sal= (v9535 + v9102 + v1022)
>>>>>>>
>>>>>>> Daniel
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Aug 22, 2011 at 8:54 PM, Nadine Brooks <[email protected]> wrote:
>>>>>>>> Hi statalist
>>>>>>>>
>>>>>>>> I am a beginner Stata user and I am having trouble to generate a new
>>>>>>>> variable. I am using:
>>>>>>>> gen sal=sum (v9535,v9102,v1022)
>>>>>>>> and I am getting: v9535,v9102,v1022 invalid name
>>>>>>>> r(198);
>>>>>>>>
>>>>>>>> But the names of all the variables are correct, so what I am doing wrong?
>>>>>>>>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index