Dear Austin,
Thanks very much for the feedbacks. I guess I didn't express my question
clearly. What I want to get is the "mean" of number of visits, not the
number of visit for each individual. So the lamda value in your data will be
the mean of visits if we assume the distribution is poisson. Of course,
that's a conveneient assumption. As David pointed out, the real distribution
will skew to the right and we may need to use negative binomial. Looking
forward to hearing your comments. Thanks again. Chunling
-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Austin Nichols
Sent: Friday, May 25, 2007 4:21 PM
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: A question on using survival model
Chunling Lu --
This strikes me as a very bad idea. What can you possibly hope to gain by
imputing number of visits from the information {visits>0} in this context?
Even if there were a clear reason to do this, it cannot be done.
Here is some actual data:
visits last |
year | Freq. Percent Cum.
------------+-----------------------------------
0 | 5,266 87.16 87.16
1 | 137 2.27 89.42
2 | 85 1.41 90.83
3 | 65 1.08 91.91
4 | 66 1.09 93.00
5 | 32 0.53 93.53
6 | 63 1.04 94.57
7 | 16 0.26 94.84
8 | 23 0.38 95.22
9 | 9 0.15 95.37
10 | 34 0.56 95.93
11 | 1 0.02 95.95
12 | 70 1.16 97.10
13 | 5 0.08 97.19
14 | 4 0.07 97.25
15 | 16 0.26 97.52
16 | 4 0.07 97.58
17 | 1 0.02 97.60
..
which you would see as
anyvisits | Freq. Percent Cum.
------------+-----------------------------------
0 | 5,266 87.16 87.16
1 | 776 12.84 100.00
------------+-----------------------------------
Total | 6,042 100.00
Using your method, you would estimate lambda = -log(5266/6042) = 0.137 but
this implies the expected tab of number of visits looks like:
visits | Freq. Percent Cum.
------------+-----------------------------------
0 | 5,266 87.16 87.16
1 | 724 11.98 99.14
2 | 50 0.82 99.96
3 | 2 0.04 100.00
4 | 0 0.00 100.00
..
which ain't even close to right.
If you really need number of visits on your data, your only way forward is
"cold deck imputation" or "statistical matching" I think.
On 5/25/07, Chunling Lu <chunling_lu@harvard.edu> wrote:
> David, thanks for the information. But I think we may work out
> something here. We know that individuals either not seeing doc, or
> seeding doc at least once in the last 30 days. So we may calculate
> probability(y>=1) (y is the number of visits) = 1-probability(y=0) in
> the last 30 days. Using poisson distribution for counts, we know that
> p(y=0)=1-exp(-lamda), we may then derive lamda value which is the mean
> of number of visits. How do you think about this? Thanks very much.
> Chunling
>
> -----Original Message-----
> From: David Greenberg
> You can't, unless you are confident that those who visited a doctor
> within the last 30 days did so only once. David Greenberg, Sociology
> Department, New York University
>
> ----- Original Message -----
> From: Chunling Lu <chunling_lu@harvard.edu>
> > I have a question "When was the last time you visited doctor" with
> > the following categories: (1) in the last 30 days, (2) between 1
> > month and less than 1 year ago. I now would like to derived the
> > average number of visits for last 30 days. How should I model it and
> > how can I do it in stata?
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/