Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: How to fill in the missing data


From   Nick Cox <[email protected]>
To   "[email protected]" <[email protected]>
Subject   Re: st: How to fill in the missing data
Date   Mon, 10 Jun 2013 08:55:39 +0100

This approach is documented at

http://www.stata.com/support/faqs/data-management/replacing-missing-values/

but I agree with Sergiy: your problem is an interpolation problem.
Possible commands include -ipolate- (official), -cipolate- (SSC),
-csipolate- (SSC), -pchipolate- (SSC).

(I mention also -nnipolate- (SSC) for completeness, but it would not
be a good fit for your particular problem.)

Nick
[email protected]


On 10 June 2013 06:27, Sergiy Radyakin <[email protected]> wrote:
> Alexis, in your approach when you impute the weight you have a risk of
> carrying the weight of one patient to the next one, if the first
> measurement is missing for the second patient (your last line
> disregards ID). So unless it is known that the first measurement of
> weight is always present, (and we see from the provided example it is
> not the case) this method would create very incorrect results.
>
> Wong, are your datapoints such that each patientid-age combinations
> are unique? or do you sometimes see same patient twice within a year?
> (then be careful even with the -sort- statement).
>
> It sounds like interpolation is likely needed here since the intervals
> of missing observations are of different size and weight probably
> changes smoothly with age. But it shouldn't be difficult.
>
> Best, Sergiy
>
> On Mon, Jun 10, 2013 at 1:01 AM, Alexis Penot <[email protected]> wrote:
>> You can try this
>> sort id age
>> gen weight2 = weight
>> replace weight2 = weight2[_n-1] if missing(weight2)
>>
>> Alexis
>>
>> Le 10 juin 2013 à 06:45, Ching Wong <[email protected]> a écrit :
>>
>>> Hi,
>>>
>>> I have a dataset as following:
>>>
>>> id age weight
>>> 1   21   50.2
>>> 1   22
>>> 1   23   52.9
>>> 1   24   51.0
>>> 1   25
>>> 2   22
>>> 2   23
>>> 2   25   60.2
>>> 3   21
>>>
>>> And I would like to create a new variable "weight2" and fill in the
>>> missing data based on the previous value
>>>
>>> My expected output value should be as follows:
>>>
>>> id age weight weight2
>>> 1   21   50.2     50.2
>>> 1   22       .       50.2
>>> 1   23   52.9     52.9
>>> 1   24   51.0     51.0
>>> 1   25      .        51.0
>>> 2   22      .            .
>>> 2   23      .            .
>>> 2   25   60.2     60.2
>>> 3   21      .            .
>>>
>>> I have tried the command below but that cannot produce what I expected.
>>>
>>> - bysort id (age): gen weight_hat = weight[_n-1]
>>>
>>> It is very obvious that command is missing something. So what will be
>>> the correct command in this case?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index