Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Cobb, Adam" <adamcobb@umich.edu> |

To |
"statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |

Subject |
st: Time-series data with many '0' observations |

Date |
Fri, 24 Sep 2010 17:57:23 -0400 |

I have a research question where I seek to answer what firm-level factors influence the number of participants in a firms welfare (i.e. fringe benefit) plan. Thus, I am attempting to model the participation in said plan. I have a set of cross-sectional time series panel data on approximately 1,300 firms (the panel is unbalanced). For approximately 25% of firm-year observations there are no participants in the plan because the firm does not offer one (not due to missing data). As such, the data are heavily skewed because of the large number of 0 observations. One issue (of many) that I have confronted is the fact that the number of participants in the welfare plan is bound by the number of employees in the firm. To better take this fact into consideration I have played around with operationalizing my dependent variable as a percent (# of participants/# of employees). This is intuitively nice because it makes my dependent variable easier to communicate knowing a firm! has 4,000 participants in the plan doesnt convey much without knowing the size of the firm. However, Im open for suggestions on operationalization. My primary question relates to what is (are) the appropriate models for such a situation. A Tobit model seems inappropriate given the heavily skewed distribution. A Heckman model makes little sense given the research question. I have looked into two-part (or hurdle) models but am not sure how to set these up for time-series data. I have also begun exploring the use of zero-inflated negative binomial regression but am a bit confused by a few things; for example, whether I need to use of an “offset” to control for the number (or log) of firm employees. In sum, I could use some advice on 1) determining the most appropriate model, and 2) (if not part of the regular Stata package) setting up the model for analysis. Thank you in advance. J. Adam Cobb Doctoral Candidate, Management & Organizations Ross School of Business, University of Michigan * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: RE: Encoding and matching string values** - Next by Date:
**RE: st: RE: mlogit, how to set initial values?** - Previous by thread:
**st: RE: RE: estimation with a time trend.** - Next by thread:
**st: panel cointegration test do.file** - Index(es):