Dear Maarten. Thanks for your thorough response.
The vector of censored binary variables X, contains variables that indicate
if a person purchased a certain food product or not. Censoring occurs at
zero because for some consumers we may not observe a purchase since they may
have stockpiled food at home. We do not know the proportion of censored
cases. We only suspect that some people may not have purchased a food
product because they stockpiled at home.
I need to create a variable that indicates the number of different food
products purchased in the category i.e. a variety index which will be the
sum of the X's. I'm not sure if the variety index should be interpreted as a
count or a continuous variable. I then want to model variety as a function
of several demographics and attitudinal variables.
Censoring is a major concern in my discipline.
I hope I was more specific. So how do I go about it?
P.S. The local specialist is not sure how to go about it, either
Regards,
Andreas Drichoutis
--- Andreas Drichoutis <[email protected]> wrote:
> Assume you have a vector of binary censored variables at zero, X, and
> that you need to create a new variable that will be the sum of the
X's
> (e.g. V=X1+X2+.). What will be the problem if one uses V as a
> dependent variable in an OLS or count data model? How do you go about
> it in Stata?
Depends on the nature of the problem:
o What do you mean with binary censored variable?
- a binary variable that is sometimes censored, or
- a continuous variable that is censored (and binary refers to your
are either censored or not)
o What is the process that leads to censoring? (is it censoring at all)
- Is it a variable cut off at 0, e.g. a propensity to give to a
charity measured in euros could be negative (if one really dislikes
that charity) but is by law restricted to remain positive or zero.
- Is the censoring a two step process, e.g. one first decides whether
or not to given, and if one decides to give than one decides the
amount.
- are the variables counts (with or without an excessive amount of
zeros)
o How severe is the censoring?
- What is the proportion of censored cases in each variable?
- The sum of sencored variables is itself censored, but now the
process is a bit more complicated (and thus more difficult to
model). This censoring is more severe if the variables are
strongly correlated, so if one is censored, the others are likely
to be censored too. If you are interested in the sum of the
variables, than they are probably strongly correlated, otherwise
it would not make sense to combine them in a single variable.
o What is the substantive interpretation of the sum of the censored
variables?
o What is the aim of your analysis?
o How puritan are you/the reviewers in you discipline/your advisor with
respect to these kinds or issues?
On a more general note: these kinds of open question are much better
suited to a face to face discussion with a local specialist than a
discussion over e-mail. In my experience the most fruitfull way of
tackling such open questions is asking a lot of questions in return
(like the ones I have just asked), and in the process of answering
those question, you and the consultant can pin down the real problem.
All this presumes you have access to a local specialist...
-- Maarten
-----------------------------------------
Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands
visiting address:
Buitenveldertselaan 3 (Metropolitan), room Z434
+31 20 5986715
http://home.fsw.vu.nl/m.buis/
-----------------------------------------
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/