I have 600 observations like the following:
UNI Y X Z INV
where X is the professor's opinion (measured on a 7-point likert scale), Y is the patent count for the professor's university (the dependent variable), UNI is an identifier for the university, Z are university-level controls (e.g. size), INV is a dummy (= 1 if the professor is an inventor, = 0 if the professor engaged in patenting activity but gave up before any patents could be filed). Opinions are about obstacles to patenting activity. Inventors report significantly lower ratings (because they faced lower obstacles or they were better individuals, it is not clear).
My solution: given that the number of repondents is not uniform across universities (nor, within the same university, respondents with INV=1 equal the number of respondents with INV=0), I calculated an university-level score as the average between the inventors' average and the non-inventors average. The reviewer said: rejected, please use individual-level data or hierarchical linear modeling. The latter choice would help me to use some university-level data that would be quite long to gather at the individual level.
PLEASE note that Y is not a survey-based measure (i.e. it is not the university-based sum of patents filed by the responding inventors. Respondents filed around 60% of the Y).
If you believe the latter to be an error, please tell me. Otherwise, can you suggest me how to estimate the model using hierarchical linear modeling? In particular, which use, if any, for dummy INV? I had thought about some selection/endogenous model as suggested by the dummy, but following the reviewer's suggestion I then focused on -gllamm-, but it unclear how to handle this mix between university-level and individual-level variables, and most important, if such mixed estimation is feasible or is wrong.
Thanks,
Nicola
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/