Re: st: How to fill in the missing data

Mon, 10 Jun 2013 08:55:39 +0100

This approach is documented at http://www.stata.com/support/faqs/data-management/replacing-missing-values/ but I agree with Sergiy: your problem is an interpolation problem. Possible commands include -ipolate- (official), -cipolate- (SSC), -csipolate- (SSC), -pchipolate- (SSC). (I mention also -nnipolate- (SSC) for completeness, but it would not be a good fit for your particular problem.) Nick njcoxstata@gmail.com On 10 June 2013 06:27, Sergiy Radyakin <serjradyakin@gmail.com> wrote: > Alexis, in your approach when you impute the weight you have a risk of > carrying the weight of one patient to the next one, if the first > measurement is missing for the second patient (your last line > disregards ID). So unless it is known that the first measurement of > weight is always present, (and we see from the provided example it is > not the case) this method would create very incorrect results. > > Wong, are your datapoints such that each patientid-age combinations > are unique? or do you sometimes see same patient twice within a year? > (then be careful even with the -sort- statement). > > It sounds like interpolation is likely needed here since the intervals > of missing observations are of different size and weight probably > changes smoothly with age. But it shouldn't be difficult. > > Best, Sergiy > > On Mon, Jun 10, 2013 at 1:01 AM, Alexis Penot <alexis.penot@ens-lyon.fr> wrote: >> You can try this >> sort id age >> gen weight2 = weight >> replace weight2 = weight2[_n-1] if missing(weight2) >> >> Alexis >> >> Le 10 juin 2013 à 06:45, Ching Wong <ching.y.wong@student.adelaide.edu.au> a écrit : >> >>> Hi, >>> >>> I have a dataset as following: >>> >>> id age weight >>> 1 21 50.2 >>> 1 22 >>> 1 23 52.9 >>> 1 24 51.0 >>> 1 25 >>> 2 22 >>> 2 23 >>> 2 25 60.2 >>> 3 21 >>> >>> And I would like to create a new variable "weight2" and fill in the >>> missing data based on the previous value >>> >>> My expected output value should be as follows: >>> >>> id age weight weight2 >>> 1 21 50.2 50.2 >>> 1 22 . 50.2 >>> 1 23 52.9 52.9 >>> 1 24 51.0 51.0 >>> 1 25 . 51.0 >>> 2 22 . . >>> 2 23 . . >>> 2 25 60.2 60.2 >>> 3 21 . . >>> >>> I have tried the command below but that cannot produce what I expected. >>> >>> - bysort id (age): gen weight_hat = weight[_n-1] >>> >>> It is very obvious that command is missing something. So what will be >>> the correct command in this case? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

