Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.

# st: Time-series data with many '0' observations

 From "Cobb, Adam" To "statalist@hsphsun2.harvard.edu" Subject st: Time-series data with many '0' observations Date Fri, 24 Sep 2010 17:57:23 -0400

```I have a research question where I seek to answer what firm-level factors influence the number of participants in a firms welfare (i.e. fringe benefit) plan.  Thus, I am attempting to model the participation in said plan.  I have a set of cross-sectional time series panel data on approximately 1,300 firms (the panel is unbalanced).  For approximately 25% of firm-year observations there are no participants in the plan because the firm does not offer one (not due to missing data).  As such, the data are heavily skewed because of the large number of 0 observations.  One issue (of many) that I have confronted is the fact that the number of participants in the welfare plan is bound by the number of employees in the firm.  To better take this fact into consideration I have played around with operationalizing my dependent variable as a percent  (# of participants/# of employees).   This is intuitively nice because it makes my dependent variable easier to communicate  knowing a firm!
has 4,000 participants in the plan doesnt convey much without knowing the size of the firm.  However, Im open for suggestions on operationalization.

My primary question relates to what is (are) the appropriate models for such a situation.  A Tobit model seems inappropriate given the heavily skewed distribution.  A Heckman model makes little sense given the research question.   I have looked into two-part (or hurdle) models but am not sure how to set these up for time-series data.  I have also begun exploring the use of zero-inflated negative binomial regression but am a bit confused by a few things; for example, whether I need to use of an “offset” to control for the number (or log) of firm employees.

In sum, I could use some advice on 1) determining the most appropriate model, and 2) (if not part of the regular Stata package) setting up the model for analysis.

Thank you in advance.