Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Using Dummy interacted variables in an fe context

From   "Fitzgerald, James" <>
To   "" <>
Subject   st: Using Dummy interacted variables in an fe context
Date   Tue, 26 Jun 2012 10:26:12 +0000

Hi Statalist users

I am attempting to compare the effects of a number of explanatory variables on firm debt-to-value ratios across different sub-samples of observations.

Observations are categorised into sub-samples based on observed values of particular variables e.g. observations with values for "tangibility of assets" below the sample median "tangibility of assets" value are categorised as "low tangibility" observations etc. 

Say I am interested in comparing the effects of profitability on debt-to-value ratios in low tangibility firms to the effects of the same variables in high tangibility firms. One way to perform such a comparison is to generate an interacted dummy variable, which allows the rearcher to assess the relative effect of profitability on debt-to-value ratios in low tangibility and high tangibility firms.

However, my hausman tests indicate the presence of fixed effects, and my understanding, based on Datta et al 2005, is that interacted dummy variables cannot be appropriately employed in a fixed effects context. My understanding of their argument is as follows:

Datta et al (2005) identify a potential problem when using interactive dummy variables in a fixed effects model. In order to benefit from the time series element of panel data when comparing coefficients across sub-samples, studies generally allow firm-year observations to move into and out of the relevant sub-samples over time. Thus the dummy variable value (DV) for a given firm may be 0 in t=3, 1 in t=2, and 0 in t=1. Now, if the firms are categorised into sub-samples based on variable levels i.e. not first differences, once the first differencing transformation takes place the dummy variable no longer results in a binary interaction variable. Consider DVt=3 and DVt=2. The first differenced DV value for the observation is DVt=3 - DVt=2 = 0 - 1 = -1. Thus, there are now three values for DV that an observation can have; 0, 1, or -1. Hence, one can no longer use the dummy variable to distinguish between the effects of a variable on two sub-samples of observations, as there ar!
 e now three classifications of observations under the dummy variable. This problem does not arise if one performs the first differencing transformation prior to the categorisation procedure, as observations can be categorised based on their first differenced values and a binary dummy variable can be produced. However, observations will be classed differently when classed on first differences rather than levels. Consider the scenario where observations are being classified based on a tangibility variable, say, the ratio of fixed to total assets (FA). Firm A has the following tangibility values; FAt=3 = 0.6, FAt=2 = 0.65, FAt=1 = 0.55. Assume that the categorisation rule is that observations with FA above 0.5, the median FA observation, are classified as high tangibility observations and those below 0.5 as low tangibility observations. In each time period, firm A observations are classified as high tangibility. This classification procedure has been based on levels. Now, cons!
 ider the scenario where observations are classified based on first dif

ferences. The first differences of observations t=3 and t=2 are -0.05 and 0.10 respectively. Now, due to the fact that level classifications do not equate to first difference classifications, observations may be classed differently even though the value of the firm characteristic has not changed. For example, if the median first differeced FA is 0, observation t=2 is now classified as low tangibility. Thus, the two classification procedures are not interchangeable. One must decide to classify by levels and then first difference, or first difference and then classify by first differences, and this decision will determine whether or not an interacted variable can be appropriately interpreted. Which procedure is chosen will depend on the theoretical definition of a high or low tangible observation. 

My suggestion to circumvent such a problem is to simply run separate regressions on the two sub-samples of observations and compare the coefficients.

Can someone please tell me if my reasoning is sound or completely off the wall??

Best regards

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index