Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: Issues with missing values |
Date | Mon, 10 Mar 2014 15:38:16 +0000 |
Thanks for this. You clarified that -cal_in- is your response variable and that you have missing values on the predictors too. Whatever you do is wrong here from some point of view of view but I'd bet that * using the data as they come, so leaving out missings would get more votes than + using the data as they come and replacing missings by means but regardless you do have scope for doing both and seeing how much difference it makes. Nick njcoxstata@gmail.com On 10 March 2014 15:27, Halua Koko <haluakoko@gmail.com> wrote: > Hi Nick, > Thanks for the response. Sorry didn't mention it before, my y=calorie > intake (cal_in). It's a continuous variable. I really didn't want to > go into the messy multiple imputation techniques, so I tried the > linear prediction technique, ie: > reg y x1 x2.. > predict y' > But I guess due to missing values in x1, x2, this isn't working. I've > been trying to figure out other work-arounds, but unsuccessfully. At > the moment, I have about 20% of the 5000 obs missing, would you > suggest going ahead without them? Would you have any other ways of > solving this particularly perturbing issue? Indeed I'll refer to it as > a wide "structure" from now on! > Thanks again > Halua > > On Mon, Mar 10, 2014 at 3:59 PM, Nick Cox <njcoxstata@gmail.com> wrote: >> The main issue here is what you are trying to do. >> >> 1. It might seem reasonable for your purposes to replace missings with >> the mean. Even though you might be unable or unwilling to apply >> imputation, some kind of interpolation (in time) is, however, a >> possible alternative. >> >> 2. But the missings replaced with means don't carry new information >> about the distribution. Classifying into quantile-based groups is >> spurious unless you use only the non-missings to determine quantiles. >> Unfortunately, it is also likely to be spurious applying that to the >> extra means too. -xtile- does the best it can, but necessarily often >> produces bizarre results because of its rule that identical values >> must be placed in the same group. >> >> 3. I don't understand the fudge you are imagining, but it sounds quite >> arbitrary and difficult to defend. >> >> 4. I didn't catch why you think you you need to classify these values >> any way. I don't know what -cal_in- is, but using the panel means (or >> medians) of what you have seems a more defensible way to make use of >> what information there is. That, however, may miss the point if you >> want to catch impacts during the time panels were observed. >> >> 5. Panel data are almost always better off in a long shape or >> structure (my self-imposed Sisyphean task is to persuade people not to >> say "format" given its existing use in Stata). >> >> >> Nick >> njcoxstata@gmail.com >> >> >> On 10 March 2014 14:31, Halua Koko <haluakoko@gmail.com> wrote: >> >>> I've been working with a panel dataset and while putting it together >>> have replaced a number of missing values in variable cal_in with the >>> mean for each of the years. But when trying to create quintiles of the >>> baseline values to assess heterogeneity of impact (using xtile >>> Q=cal_in, nq(5)), I noticed that doing so had clumped together about >>> 1000obs around one value, ie, the mean. So in essence my xtile groups >>> are distributed unevenly and the 4th quantile seems to be entirely >>> missing. FYI my panel is in the wide format. >>> Can anyone suggest a solution to this problem? I was thinking of >>> redistributing the clumped values by small increments so as to have >>> the same mean, but differing values, but not sure how to do this. >>> Can anyone help me figure this out? >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/faqs/resources/statalist-faq/ >> * http://www.ats.ucla.edu/stat/stata/ > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/