Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: How to fill in the missing data


From   Sergiy Radyakin <serjradyakin@gmail.com>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: How to fill in the missing data
Date   Mon, 10 Jun 2013 01:27:03 -0400

Alexis, in your approach when you impute the weight you have a risk of
carrying the weight of one patient to the next one, if the first
measurement is missing for the second patient (your last line
disregards ID). So unless it is known that the first measurement of
weight is always present, (and we see from the provided example it is
not the case) this method would create very incorrect results.

Wong, are your datapoints such that each patientid-age combinations
are unique? or do you sometimes see same patient twice within a year?
(then be careful even with the -sort- statement).

It sounds like interpolation is likely needed here since the intervals
of missing observations are of different size and weight probably
changes smoothly with age. But it shouldn't be difficult.

Best, Sergiy

On Mon, Jun 10, 2013 at 1:01 AM, Alexis Penot <alexis.penot@ens-lyon.fr> wrote:
> You can try this
> sort id age
> gen weight2 = weight
> replace weight2 = weight2[_n-1] if missing(weight2)
>
> Alexis
>
> Le 10 juin 2013 à 06:45, Ching Wong <ching.y.wong@student.adelaide.edu.au> a écrit :
>
>> Hi,
>>
>> I have a dataset as following:
>>
>> id age weight
>> 1   21   50.2
>> 1   22
>> 1   23   52.9
>> 1   24   51.0
>> 1   25
>> 2   22
>> 2   23
>> 2   25   60.2
>> 3   21
>>
>> And I would like to create a new variable "weight2" and fill in the
>> missing data based on the previous value
>>
>> My expected output value should be as follows:
>>
>> id age weight weight2
>> 1   21   50.2     50.2
>> 1   22       .       50.2
>> 1   23   52.9     52.9
>> 1   24   51.0     51.0
>> 1   25      .        51.0
>> 2   22      .            .
>> 2   23      .            .
>> 2   25   60.2     60.2
>> 3   21      .            .
>>
>> I have tried the command below but that cannot produce what I expected.
>>
>> - bysort id (age): gen weight_hat = weight[_n-1]
>>
>> It is very obvious that command is missing something. So what will be
>> the correct command in this case?
>>
>> Cheers,
>>
>> Wong
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index