Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: impute with draws from random distribution


From   Austin Nichols <austinnichols@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: impute with draws from random distribution
Date   Wed, 22 Jun 2011 11:10:02 -0400

Joerg Luedicke <joerg.luedicke@gmail.com>, D-Ta <altruist81@gmx.de>,
Maarten, et al.--

I disagree that the proposal makes no sense at all--suppose for the
sake of argument that all observables are important to control for
only with respect to their impact on duration not participating (time
to participation), or the hazard of participation in each program; in
such a case estimating a competing-risks hazard model and predicting
durations of non-participation (median or other percentiles, probably)
then matching on predicted duration makes sense.  What does not make
sense to me is to match observed durations to predicted durations, but
it is possible that such an approach is justifiable.  I have not read
the cited papers, and I would appreciate complete references as
specified in the Statalist FAQ.

On Wed, Jun 22, 2011 at 10:30 AM, Joerg Luedicke
<joerg.luedicke@gmail.com> wrote:
> On Wed, Jun 22, 2011 at 4:03 AM, D-Ta <altruist81@gmx.de> wrote:
>> Dear All,
>>
>> I am looking for a convenient solution to the following problem. That is the
>> type of the sample I am working with:
>>
>> id      x1      x2      participant     programm        time to
>> participation
>> 1       5       23      1       1       3.5
>> 2       6       42      1       2       5.7
>> 3       73      7       0       .       .
>> 4       35      2       0       .       .
>> 5       5       6       1       1       12
>> 6       34      34      1       1       3.5
>> 7       34      34      1       2       8.1
>>
>>
>>
>> The sample consists of of individuals (with covariates x1 and x2) who can
>> either be participating in programm 1, programm 2, or be non-participants.
>> The non-participants are my controll group. One of the control variables
>> that I would like to condition on in a subsequent matching step is time to
>> participation. By definition, time to participation is not observed for the
>> non-participants. Hence, I would like to create hypothetical values in that
>> variable for the group of non-participant. It is standard in the literature
>> to randomly draw from the distribution of the participants.
>>
>> Since there are two groups of participants, there are also two different
>> distributions in the start dates. I would like to assign two values in the
>> time to participation to each non-participant (hypothetical time to
>> participation in programm 1 and hypothetical time to participation in
>> programm 2)
>>
>> Any suggestions how to do this??
>>
>
> I agree with Maarten that this makes no sense at all. Let's say your
> aim is to create a balanced sample via the use of propensity scores.
> Now let's further assume that "time to participation" would be the
> only variable of concern. If you now impute some values and predict
> your propensity scores, a claim of any subsequent analysis would be
> that the treatment assignment is ignorable based on the observed time
> that elapsed between timepoint x and start of the program. Only that
> for the non-participants, there was no start of a program in the first
> place, hence no elapsed time and, thus, there is nothing to balance.
>
> J.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index