# st: 3 issues multiple firm obs per year, bounded dependent variable and odd independent variable

 From Cristian Dezso To statalist@hsphsun2.harvard.edu Subject st: 3 issues multiple firm obs per year, bounded dependent variable and odd independent variable Date Mon, 12 Oct 2009 22:07:43 -0400

Hello,

this is my first post on the listserv and I will try to make it short and informative.

I have a data set on all video games released since the beginning of the industry. I want to analyze how the user rating of a game depends on genre, firm and year specific controls, as well as a variable that measures how familiar with each other are the team members that worked on the game.

I have several issues with the analysis that I would like to run, but the following three seem to be the more important:

1. The data set is not a panel, in that for some firms I have several observations per year, since a firm can release more than one game per year
- what I have done to address this issue is to declare xtset with only the panel variable (firm identificator), and used xtreg and xttobit with firm specific random effects and release year dummies - in the first run of analyses

Question: is this appropriate for the type of analysis I want to do, and should I drop all observations whereby the firm released a single game during the entire sample period?

2. The dependent variable - user rating - is continuous and can only take values between 0 and 5.
- what I have done is to use xtreg (probably not appropriate) but also xttobit (without declaring lower or upper bounds) given that the variable is bounded

Question: is tobit appropriate for this kind of analysis given that the variable, while bounded, is NOT censored? If not, can I use a transformation for the dependent variable so that it is distributed over the (0,infinity)?

3. The variable that measures team familiarity takes the value 0 almost 50% of the time (team members never worked together before) and is really very low almost all the time.
- I used it nevertheless as a continuous independent variable in the sales regression, but the oddest thing happened: the coefficient is negative and significant; but if I use a dummy that takes the value 0 if team familiarity is 0 and 1 otherwise, the coefficient on the dummy is positive.

Question: what would be the appropriate solution for such a highly skewed independent variable - continuous or dummy? And what could explain this puzzle of a negative coefficient for the continuous representation, but positive for the dummy?

I apologize for the long post and thanks in advance for your help,

Cristian