Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Unbalancing a panel data set with country pairs

From   Gordon Hughes <>
Subject   Re: st: Unbalancing a panel data set with country pairs
Date   Sat, 22 Jan 2011 10:18:31 +0000

The nature of your data isn't entirely clear, so we need to be clear about this.

Your description seems to correspond to the following situation: we have a series of N x N matrices of migration flows from country i to country j so that m[i,j,t] is the total flow from i to j in year t. You have converted the matrices from wide format to long format so each observation contains i, j, t and m[i,j,t] plus a set of independent variables - x[i,t], y[j,t], z[i,j,t] - describing (for example) economic conditions in both origin and destination countries or the distance between them. Then you want to delete panels for i,j pairs for which either (a) m[i,j,t] is missing for some t or (b) some of the x[], y[], z[] variables are missing.

If this is correct, then there are lots of ways of doing this and some will be more efficient than others. However, clumsy but obvious may be the best choice so you are clear what you are doing. If you haven't already done so, create a numeric country pair identifier cpair=(i-1)*N+j and then use:

sort cpair year
egen n_miss_var=rowmiss(<varlist>)
by cpair: egen max_miss_var=max(n_miss_var)
by cpair: egen n_obs=count(year)
drop if max_miss_var > 0 | n_obs < tmax

For each observation - i.e. country pair and time period - step 1 counts the number of missing values in <varlist> which would include m[] and whichever of the independent variables you are interested in. Step 2 identifies whether any of the observations in a particular country pair panel contains missing data. Note that egen adds the value of max_miss_var to all observations in the panel. Step 3 counts the number of observations for each country pair. Finally step 4 deletes all observations for any panel that contains any observations with missing data plus any country pair with a panel of less than tmax (=17) observations. This seems to be what you want to do. As Nick Cox explained, the key is to use a combination of -by ..- with -egen-.

However, I would question whether this is the right way to proceed. It will generate a strongly balanced panel but for many datasets it will throw away a large amount of data. You might be better off looking for methods of analysis that do not require strongly balanced panels.

Gordon Hughes

From: Matei Frunzetti <>
Subject: st: Unbalancing a panel data set with country pairs

Dear Statalisters,

I am fairly new to Stata and therefore might have to bother you with
somewhat trivial questions in the near future. Please excuse.

Cutting to the chase:
I 'm working on a panel data set over 17 years. It's fairly unbalanced
und i need to drop all observations for country pairs that either lack
full length (as in years) or have missings in one of the independant
variables. The problem is that i have to delete all observations of
these country pairs for all years if only one or more variables have a
missing or if it is one or more years short. I ran into a dead end
trying to figure out how to imply the rest of the observations of a
"faulty" country pair into the drop command.

I hope that was enough information. Help would be much appreciated



*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index