Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: count data truncated at one

 From Nick Cox <[email protected]> To [email protected] Subject Re: st: count data truncated at one Date Tue, 12 Jun 2012 22:44:03 +0100

```I don't think your first statement, which I have labelled #1 below,
has anything like axiomatic flavour.

On the contrary, on most of the occasions on which I have applied
-ologit- I have thought of it as a way of summarizing the structure of
a response, conventionally defined as so many ordered levels, without
wanting to participate in an act of faith about whatever really
underlies what was reported. For other researchers the emphasis may be
the other way round, but the algebra is insensitive to your belief or
lack of belief in a latent variable. Emphases here are a matter of
methodological taste.

Your second statement, #2 below, in contrast, I regard as an attempt
to have it both ways. In my view, "ordinal" means what it says and any
idea that 4 is really twice 2, and so forth, has been discarded.
That's the price of admission and it's not negotiable.

I underline David's general comments too.

On Tue, Jun 12, 2012 at 4:41 PM, Laurie Molina <[email protected]> wrote:
> Ok, thank you all, as always you have provided very useful insights.
> I think I will go with the ologit.

#1

Just one more thing. ologit is
> motivated by the existence of a latent variable and thrasheholds that
> define the value of the observed discrete variable.
> In my case, I do observe the underlying variable (payment/reference
> number), when this value is in a neighborhood around 2, I say that it
> pays 2 times the reference number, and so on.

#2

> How can I add this information to the estimation? To my understanding
> ologit does not take that information in to account.
> Sorry if I cannot provide very much additional information.
> Thank you again,
> LM
>
>
> On Tue, Jun 12, 2012 at 6:33 AM, David Hoaglin <[email protected]> wrote:
>> So far, we have little information on the variable in question beyond
>> the statements
>> "People included in the regression are members of a group defined as
>> people paying 2 to ten times a reference number."
>> and
>> "Most of the observations have y=2, then the frequencies are
>> decreasing for higher values of y, but then when there is also a high
>> frequency of observations with y=10."
>> If values of y > 10 have been combined with y = 10 (perhaps because 10
>> was the highest multiple possible in the particular setting), then, as
>> Tirthankar suggested, the analysis should take the into account the
>> censoring at 10.
>>
>> In my brief experience with Statalist, I have seen a number of
>> questions that seek input on statistical analysis but give only
>> generic information about the data.  The fact that, for example, the
>> values of the dependent variable range from 2 to 10 is only a
>> beginning.  Every actual application has a context, which usually has
>> a substantial impact on successful analysis of the data.  As a
>> consultant, I expect to have a dialog with a client, learning about
>> the research question and the details of the data, before I recommend
>> a particular analysis.  It may not be possible to share some details
>> with the list (e.g., because they need to remain confidential), but
>> lack of information limits our ability to give effective advice.  We
>> often make a serious effort to be helpful, only to learn, when more
>> information emerges, that we were not addressing the right question.
>>
>> David Hoaglin
>>
>> On Tue, Jun 12, 2012 at 4:08 AM, Nick Cox <[email protected]> wrote:
>>> Tirthankar is clearly correct in underlining the possibility of a
>>> customised model rather than forcing this into some pre-existing model
>>> that is not quite right. Note that you would need, for credibility, to
>>> ensure not only that the likelihood was defined appropriately but also
>>> that predicted values fall within [2,10].
>>>
>>> Thar said, the substantive or scientific choice should hinge largely
>>> on whether the response is considered as # iterms bought or the
>>> probability of # iterms being bought. I think here my view is close to
>>> that of David.
>>>
>>> Any way, who said that you are restricted to a single model?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```