Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: A question on using survival model


From   "Austin Nichols" <austinnichols@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: A question on using survival model
Date   Fri, 25 May 2007 16:21:21 -0400

Chunling Lu --
This strikes me as a very bad idea.  What can you possibly hope to
gain by imputing number of visits from the information {visits>0} in
this context?  Even if there were a clear reason to do this, it cannot
be done.

Here is some actual data:

visits last |
      year |      Freq.     Percent        Cum.
------------+-----------------------------------
         0 |      5,266       87.16       87.16
         1 |        137        2.27       89.42
         2 |         85        1.41       90.83
         3 |         65        1.08       91.91
         4 |         66        1.09       93.00
         5 |         32        0.53       93.53
         6 |         63        1.04       94.57
         7 |         16        0.26       94.84
         8 |         23        0.38       95.22
         9 |          9        0.15       95.37
        10 |         34        0.56       95.93
        11 |          1        0.02       95.95
        12 |         70        1.16       97.10
        13 |          5        0.08       97.19
        14 |          4        0.07       97.25
        15 |         16        0.26       97.52
        16 |          4        0.07       97.58
        17 |          1        0.02       97.60
...
which you would see as

 anyvisits |      Freq.     Percent        Cum.
------------+-----------------------------------
         0 |      5,266       87.16       87.16
         1 |        776       12.84      100.00
------------+-----------------------------------
     Total |      6,042      100.00

Using your method, you would estimate lambda = -log(5266/6042) = 0.137
but this implies the expected tab of number of visits looks like:

    visits |      Freq.     Percent        Cum.
------------+-----------------------------------
         0 |      5,266       87.16       87.16
         1 |        724       11.98       99.14
         2 |         50        0.82       99.96
         3 |          2        0.04      100.00
         4 |          0        0.00      100.00
...
which ain't even close to right.

If you really need number of visits on your data, your only way
forward is "cold deck imputation" or "statistical matching" I think.


On 5/25/07, Chunling Lu <chunling_lu@harvard.edu> wrote:
David, thanks for the information. But I think we may work out something
here. We know that individuals either not seeing doc, or seeding doc at
least once in the last 30 days. So we may calculate probability(y>=1) (y is
the number of visits) = 1-probability(y=0) in the last 30 days. Using
poisson distribution for counts, we know that p(y=0)=1-exp(-lamda), we may
then derive lamda value which is the mean of number of visits. How do you
think about this? Thanks very much. Chunling

-----Original Message-----
From: David Greenberg
You can't, unless you are confident that those who visited a doctor within
the last 30 days did so only once. David Greenberg, Sociology Department,
New York University

----- Original Message -----
From: Chunling Lu <chunling_lu@harvard.edu>
> I have a question "When was the last time you visited doctor" with the
> following categories: (1) in the last 30 days, (2) between 1 month and
> less than 1 year ago. I now would like to derived the average number
> of visits for last 30 days. How should I model it and how can I do it
> in stata?
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index