Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <njcoxstata@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: basic question |

Date |
Wed, 24 Aug 2011 08:25:19 +0100 |

The tone of this reply suggests that Nadine is still puzzled about something, but precisely what is unclear to me. However, a -float-/-double- difference would only make a difference to the precision of very large totals. It wouldn't affect whether totals were returned as 0 or missing when all components were missing. That depends on the -missing- option of -egen, rowtotal()-. Nick On Tue, Aug 23, 2011 at 3:46 AM, Nadine Brooks <nb.statalist@gmail.com> wrote: > First of all: thanks for the help > Second: I did not know about the rules in the Statalist. I had bought > a book to make my life esier to deal with this program but Stata has a > lot tricks (missing is bigger than zero?!). I was in a hurry to solve > the problem, so I signed the list and sent the email without reading > all instructions. Sorry, this will not happen again. > > The worse part is that now I solve the problem but I did not learned > what was wrong.... > > I rename the variables v9532=sal1, v9932=sal2, v1022=sal3. But before > that I change the type from float to double. Why? I do not know "trial > and error + despair". > And after that I used egen sal= rowtotal (sal1 sal2 sal3), missing. > By the way: egen sal= rowtotal (v9532 v9932 v1022), missing is also working... > > Does it has anything to do with float/ double type? Or I mised something? > > Nadine > > > > 2011/8/22 Phil Clayton <philclayton@internode.on.net>: >> Nadine replied privately off-list; this is generally discouraged in the Statalist FAQ so I am re-posting her message below. >> >> I'm a little unclear on what you actually want. If you want people with no income in any of the 3 variables to have a missing value for sal, the easiest option would be to add the -missing- option to the -egen- command: >> egen sal=rowtotal(v9535 v9982 v1022), missing >> >> egen sal=rowtotal(v9535 v9982 v1022) if v9535>0 >> will not do what you want because, in Stata, missing is the highest number you can have. So missing is greater than zero. As an alternative you could try: >> egen sal=rowtotal(v9535 v9982 v1022) if v9535>0 & v9535<. >> or >> egen sal=rowtotal(v9535 v9982 v1022) if v9535>0 & !missing(v9535) >> >> Neither of these solutions are as good as -egen sal=rowtotal(v9535 v9982 v1022), missing- because they assume that v9982 and v1022 will definitely be missing if v9535 is missing. In a perfect world your dataset would be clean enough that this would always be true, but in real life this is not always the case so it's safer to assume that there may be income recorded in v9982 and/or v1022 even if v9535 is missing. >> >> Incidentally, why not rename the variables something more readable such as salary, income1, income2 and income3? >> >> Phil >> >> On 23/08/2011, at 11:40 AM, Nadine Brooks wrote: >> >>> Thanks Phil and Eric but even with egen I can not solve my problem. >>> >>> I am working with a survey data with 410,241 individual from all ages. >>> Some of them work and other not. Some the variables that i wnat to sum >>> is: >>> >>> v9532: income from main job >>> v9982: income from secondary job >>> v1022: income from the third or more jobs >>> >>> so only 170,014 indivuduals work, so when I use egen >>> sal=rowtotal(v9535 v9982 v1022) I will have people with income equal >>> zero... >>> >>> Take a look: >>> >>> sum v9532 v9982 v1022 >>> >>> Variable | Obs Mean Std. Dev. Min Max >>> -------------+-------------------------------------------------------- >>> v9532 | 170014 831.5625 1451.442 3 120000 >>> v9982 | 8326 686.3957 1179.807 1 48000 >>> v1022 | 672 957.75 1422.576 8 11000 >>> >>> egen sal=rowtotal(v9535 v9982 v1022) >>> >>> . sum v9532 v9982 v1022 sal >>> >>> Variable | Obs Mean Std. Dev. Min Max >>> -------------+-------------------------------------------------------- >>> v9532 | 170014 831.5625 1451.442 3 120000 >>> v9982 | 8326 686.3957 1179.807 1 48000 >>> v1022 | 672 957.75 1422.576 8 11000 >>> sal | 410241 15.88779 225.3688 0 48000 >>> >>> Now I have all the individuals in my survey data with some income, >>> even zero. But I dont want that. >>> >>> After your advice I had tried also: egen sal=rowtotal(v9535 v9982 >>> v1022) if v9535>0 >>> because who has the 2nd and or 3th job must have the first (main). But >>> it did not work as well >>> >>> Thanks, Nadine >> >> >> On 23/08/2011, at 11:46 AM, Eric Booth wrote: >> >>> <> >>> Nadine: >>> >>> You received several, similar answers about your issue. In addition to all these and the help files for -sum()- and -egen-, take a look at Nick Cox's 2002 article "Speaking Stata: On getting functions to do the work." Stata Journal 2: 411–427. (Free due to SJ's moving pay wall at: http://www.stata-journal.com/sjpdf.html?articlenum=pr0007 ) >>> >>> - Eric >>> On Aug 22, 2011, at 8:24 PM, Nadine Brooks wrote: >>> >>>> But it is not what is happening. Take a look: >>>> >>>> sum v9532 v9982 v1022 >>>> >>>> Variable | Obs Mean Std. Dev. Min Max >>>> -------------+-------------------------------------------------------- >>>> v9532 | 170014 831.5625 1451.442 3 120000 >>>> v9982 | 8326 686.3957 1179.807 1 48000 >>>> v1022 | 672 957.75 1422.576 8 11000 >>>> >>>> . gen sal= (v9532+v9982+v1022) >>>> (409603 missing values generated) >>>> >>>> . sum v9532 v9982 v1022 sal >>>> >>>> Variable | Obs Mean Std. Dev. Min Max >>>> -------------+-------------------------------------------------------- >>>> v9532 | 170014 831.5625 1451.442 3 120000 >>>> v9982 | 8326 686.3957 1179.807 1 48000 >>>> v1022 | 672 957.75 1422.576 8 11000 >>>> sal | 638 3999.621 4536.377 68 40000 >>>> >>>> >>>> My variables meas: >>>> >>>> v9532: income from main job >>>> v9982: income from secondary job >>>> v1022: third or more of income jobs >>>> >>>> But most of the people have only one job, so they get missing to v9982 >>>> and v1022... >>>> >>>> Thanks, Nadine >>>> >>>> >>>> 2011/8/22 Daniel Marcelino <dmsilv@gmail.com>: >>>>> Well, I don't know exactly what yours variables means. >>>>> If you have numeric values, you result should be: >>>>> v1 v2 v3 sal >>>>> 1 2 3 6 >>>>> 1 . 1 2 >>>>> >>>>> >>>>> Daniel >>>>> >>>>> On Mon, Aug 22, 2011 at 9:11 PM, Nadine Brooks <nb.statalist@gmail.com> wrote: >>>>>> By there are missing values particularlly in v9102 and v1022, so I >>>>>> think that I can not use the operator +, can I? >>>>>> >>>>>> >>>>>> >>>>>> 2011/8/22 Daniel Marcelino <dmsilv@gmail.com>: >>>>>>> try >>>>>>> >>>>>>> gen sal= (v9535 + v9102 + v1022) >>>>>>> >>>>>>> Daniel >>>>>>> >>>>>>> >>>>>>> On Mon, Aug 22, 2011 at 8:54 PM, Nadine Brooks <nb.statalist@gmail.com> wrote: >>>>>>>> Hi statalist >>>>>>>> >>>>>>>> I am a beginner Stata user and I am having trouble to generate a new >>>>>>>> variable. I am using: >>>>>>>> gen sal=sum (v9535,v9102,v1022) >>>>>>>> and I am getting: v9535,v9102,v1022 invalid name >>>>>>>> r(198); >>>>>>>> >>>>>>>> But the names of all the variables are correct, so what I am doing wrong? >>>>>>>> * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: basic question***From:*Nadine Brooks <nb.statalist@gmail.com>

**Re: st: basic question***From:*Daniel Marcelino <dmsilv@gmail.com>

**Re: st: basic question***From:*Nadine Brooks <nb.statalist@gmail.com>

**Re: st: basic question***From:*Daniel Marcelino <dmsilv@gmail.com>

**Re: st: basic question***From:*Nadine Brooks <nb.statalist@gmail.com>

**Re: st: basic question***From:*Eric Booth <ebooth@ppri.tamu.edu>

**Re: st: basic question***From:*Phil Clayton <philclayton@internode.on.net>

**Re: st: basic question***From:*Nadine Brooks <nb.statalist@gmail.com>

- Prev by Date:
**Re: st: Nearest neighbor distance** - Next by Date:
**Re: st: robust poisson regression vs. glm with log link** - Previous by thread:
**Re: st: basic question** - Next by thread:
**Re: st: basic question** - Index(es):