"Andreas Drichoutis" <[email protected]>

<[email protected]>

st: Re: Censored variables

Thu, 17 Jan 2008 15:32:34 +0200

Dear Maarten. Thanks for your thorough response. The vector of censored binary variables X, contains variables that indicate if a person purchased a certain food product or not. Censoring occurs at zero because for some consumers we may not observe a purchase since they may have stockpiled food at home. We do not know the proportion of censored cases. We only suspect that some people may not have purchased a food product because they stockpiled at home. I need to create a variable that indicates the number of different food products purchased in the category i.e. a variety index which will be the sum of the X's. I'm not sure if the variety index should be interpreted as a count or a continuous variable. I then want to model variety as a function of several demographics and attitudinal variables. Censoring is a major concern in my discipline. I hope I was more specific. So how do I go about it? P.S. The local specialist is not sure how to go about it, either Regards, Andreas Drichoutis --- Andreas Drichoutis <[email protected]> wrote: > Assume you have a vector of binary censored variables at zero, X, and > that you need to create a new variable that will be the sum of the X's > (e.g. V=X1+X2+.). What will be the problem if one uses V as a > dependent variable in an OLS or count data model? How do you go about > it in Stata? Depends on the nature of the problem: o What do you mean with binary censored variable? - a binary variable that is sometimes censored, or - a continuous variable that is censored (and binary refers to your are either censored or not) o What is the process that leads to censoring? (is it censoring at all) - Is it a variable cut off at 0, e.g. a propensity to give to a charity measured in euros could be negative (if one really dislikes that charity) but is by law restricted to remain positive or zero. - Is the censoring a two step process, e.g. one first decides whether or not to given, and if one decides to give than one decides the amount. - are the variables counts (with or without an excessive amount of zeros) o How severe is the censoring? - What is the proportion of censored cases in each variable? - The sum of sencored variables is itself censored, but now the process is a bit more complicated (and thus more difficult to model). This censoring is more severe if the variables are strongly correlated, so if one is censored, the others are likely to be censored too. If you are interested in the sum of the variables, than they are probably strongly correlated, otherwise it would not make sense to combine them in a single variable. o What is the substantive interpretation of the sum of the censored variables? o What is the aim of your analysis? o How puritan are you/the reviewers in you discipline/your advisor with respect to these kinds or issues? On a more general note: these kinds of open question are much better suited to a face to face discussion with a local specialist than a discussion over e-mail. In my experience the most fruitfull way of tackling such open questions is asking a lot of questions in return (like the ones I have just asked), and in the process of answering those question, you and the consultant can pin down the real problem. All this presumes you have access to a local specialist... -- Maarten ----------------------------------------- Maarten L. Buis Department of Social Research Methodology Vrije Universiteit Amsterdam Boelelaan 1081 1081 HV Amsterdam The Netherlands visiting address: Buitenveldertselaan 3 (Metropolitan), room Z434 +31 20 5986715 http://home.fsw.vu.nl/m.buis/ ----------------------------------------- * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

