Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: How to fill in the missing data


From   Nick Cox <njcoxstata@gmail.com>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: How to fill in the missing data
Date   Mon, 10 Jun 2013 08:55:39 +0100

This approach is documented at

http://www.stata.com/support/faqs/data-management/replacing-missing-values/

but I agree with Sergiy: your problem is an interpolation problem.
Possible commands include -ipolate- (official), -cipolate- (SSC),
-csipolate- (SSC), -pchipolate- (SSC).

(I mention also -nnipolate- (SSC) for completeness, but it would not
be a good fit for your particular problem.)

Nick
njcoxstata@gmail.com


On 10 June 2013 06:27, Sergiy Radyakin <serjradyakin@gmail.com> wrote:
> Alexis, in your approach when you impute the weight you have a risk of
> carrying the weight of one patient to the next one, if the first
> measurement is missing for the second patient (your last line
> disregards ID). So unless it is known that the first measurement of
> weight is always present, (and we see from the provided example it is
> not the case) this method would create very incorrect results.
>
> Wong, are your datapoints such that each patientid-age combinations
> are unique? or do you sometimes see same patient twice within a year?
> (then be careful even with the -sort- statement).
>
> It sounds like interpolation is likely needed here since the intervals
> of missing observations are of different size and weight probably
> changes smoothly with age. But it shouldn't be difficult.
>
> Best, Sergiy
>
> On Mon, Jun 10, 2013 at 1:01 AM, Alexis Penot <alexis.penot@ens-lyon.fr> wrote:
>> You can try this
>> sort id age
>> gen weight2 = weight
>> replace weight2 = weight2[_n-1] if missing(weight2)
>>
>> Alexis
>>
>> Le 10 juin 2013 à 06:45, Ching Wong <ching.y.wong@student.adelaide.edu.au> a écrit :
>>
>>> Hi,
>>>
>>> I have a dataset as following:
>>>
>>> id age weight
>>> 1   21   50.2
>>> 1   22
>>> 1   23   52.9
>>> 1   24   51.0
>>> 1   25
>>> 2   22
>>> 2   23
>>> 2   25   60.2
>>> 3   21
>>>
>>> And I would like to create a new variable "weight2" and fill in the
>>> missing data based on the previous value
>>>
>>> My expected output value should be as follows:
>>>
>>> id age weight weight2
>>> 1   21   50.2     50.2
>>> 1   22       .       50.2
>>> 1   23   52.9     52.9
>>> 1   24   51.0     51.0
>>> 1   25      .        51.0
>>> 2   22      .            .
>>> 2   23      .            .
>>> 2   25   60.2     60.2
>>> 3   21      .            .
>>>
>>> I have tried the command below but that cannot produce what I expected.
>>>
>>> - bysort id (age): gen weight_hat = weight[_n-1]
>>>
>>> It is very obvious that command is missing something. So what will be
>>> the correct command in this case?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index