Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: AW: "skipping" missing data


From   Nick Cox <n.j.cox@stata.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: AW: "skipping" missing data
Date   Fri, 07 Aug 2009 08:51:33 -0500

This looks good. You could shorten it by using -egen, total()- but in practice that puts Stata to more work, although in many datasets you would hardly notice.

One of many alternatives, given here solely to show more technique, is

regress bperwk_ wk
bysort ptnum : egen nbobs = total(e(sample))
egen newgroup = group(ptnum) if nbobs > 1

The prior regress is not the one you want, but it's a way of tagging observations you want, as observations with any missings will not end up as part of the estimation sample, and so will be 0 on e(sample). That holds for any bundle of numeric variables.

Nick

B. Timothy Walsh wrote:
Dear Nick,
Many thanks--I had to modify your suggested code a little, to reflect the fact that it isn't missing rows but missing datapoints within rows that is causing the problem. I came across your very helpful tutorial re by: in _Stata Journal_ 2(1):86-102 (2002) which provided the additional guidance.

Here's the code that does work. I'm happy to have suggestions for making it more elegant, if you have any. Again, many thanks.
Tim
------------------------------------------------
bysort ptnum : gen int nbobs = sum(bperwk_ < .)
bysort ptnum : replace nbobs = nbobs[_N]

egen newgroup = group(ptnum) if nbobs > 1
summarize newgroup, meanonly

forval i = 1/`r(max)' {
     regress bperwk_ wk if newgroup == `i'
     predict p
     replace p1=p if newgroup == `i'
     drop p
}
--------------------------------------

--On Thursday, August 06, 2009 12:55 PM -0500 Nick Cox <n.j.cox@stata.com> wrote:

Singleton panels are tagged as such by

bysort ptnum : gen allonmyown = _N == 1

Alternatively, panels with two or more are tagged as such by

bysort ptnum : gen twoormore = _N > 1

after which you can go

egen group = group(ptnum) if !missing(bperwk_, wk) & !allonmyown

OR

egen group = group(ptnum) if !missing(bperwk_, wk) & twoormore

B. Timothy Walsh wrote:

Thank you: this worked very nicely.
EXCEPT I now realize I also have instances in which there is only a
single data point for an individual. Is there a simple way to modify
this line?
egen group = group(ptnum) if !missing(bperwk_, wk)

  > --On Thursday, August 06, 2009 11:28 AM -0500 Nick Cox


Here is one of several alternatives.

generate p1=.
egen group = group(ptnum) if !missing(bperwk_, wk)
summarize group, meanonly

forval i = 1/`r(max)' {
      regress bperwk_ wk if group == `i'
      predict p
      replace p1=p if group == `i'
      drop p
}

That sets the missings on one side.

See also:

FAQ     . . . . . . . . . . Making foreach go through all values of a
variable
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N.
J. Cox
         8/05    Is there a way to tell Stata to try all values of a
                 particular variable in a foreach statement without
                 specifying them?
                 http://www.stata.com/support/faqs/data/foreach.html

Despite the reference to -foreach- the FAQ is still pertinent.

Nick

Martin Weiss wrote:

*************
capture
*************

You could put it in front of individual commands, or the entire
-forvalues- loop.

B. Timothy Walsh

I am attempting to generate predictions from regressions performed for
each  of a longish list of individuals. The problem is that, for some
individuals, there are no dependent variable data (entries are
missing), so  the regression attempt fails. The problem is that the
forvalues loop then  exits. I would like to somehow "skip" these
individuals. Loop seems to work  fine if there are enough data to
perform a regression. I'd be grateful for  any suggestions.

Here's the code:
generate p1=.
forvalues i = 1/50 {        //50 individuals
    regress bperwk_ wk if ptnum == `i'
    predict p
    replace p1=p if ptnum == `i'
    drop p
}

I'm pretty much a Stata novice. So, I apologize if I am missing
something  obvious. Using version 10.1.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index