Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Unbalancing a panel data set with country pairs

From	Gordon Hughes <[email protected]>
To	[email protected]
Subject	Re: st: Unbalancing a panel data set with country pairs
Date	Sat, 22 Jan 2011 10:18:31 +0000

The nature of your data isn't entirely clear, so we need to be clearabout this.

Your description seems to correspond to the following situation: wehave a series of N x N matrices of migration flows from country i tocountry j so that m[i,j,t] is the total flow from i to j in yeart. You have converted the matrices from wide format to long formatso each observation contains i, j, t and m[i,j,t] plus a set ofindependent variables - x[i,t], y[j,t], z[i,j,t] - describing (forexample) economic conditions in both origin and destination countriesor the distance between them. Then you want to delete panels for i,jpairs for which either (a) m[i,j,t] is missing for some t or (b) someof the x[], y[], z[] variables are missing.

If this is correct, then there are lots of ways of doing this andsome will be more efficient than others. However, clumsy but obviousmay be the best choice so you are clear what you are doing. If youhaven't already done so, create a numeric country pair identifiercpair=(i-1)*N+j and then use:


sort cpair year
egen n_miss_var=rowmiss(<varlist>)
by cpair: egen max_miss_var=max(n_miss_var)
by cpair: egen n_obs=count(year)
drop if max_miss_var > 0 | n_obs < tmax

For each observation - i.e. country pair and time period - step 1counts the number of missing values in <varlist> which would includem[] and whichever of the independent variables you are interestedin. Step 2 identifies whether any of the observations in aparticular country pair panel contains missing data. Note that egenadds the value of max_miss_var to all observations in thepanel. Step 3 counts the number of observations for each countrypair. Finally step 4 deletes all observations for any panel thatcontains any observations with missing data plus any country pairwith a panel of less than tmax (=17) observations. This seems to bewhat you want to do. As Nick Cox explained, the key is to use acombination of -by ..- with -egen-.

However, I would question whether this is the right way toproceed. It will generate a strongly balanced panel but for manydatasets it will throw away a large amount of data. You might bebetter off looking for methods of analysis that do not requirestrongly balanced panels.


Gordon Hughes
[email protected]

From: Matei Frunzetti <[email protected]>
Subject: st: Unbalancing a panel data set with country pairs

Dear Statalisters,

I am fairly new to Stata and therefore might have to bother you with
somewhat trivial questions in the near future. Please excuse.

Cutting to the chase:
I 'm working on a panel data set over 17 years. It's fairly unbalanced
und i need to drop all observations for country pairs that either lack
full length (as in years) or have missings in one of the independant
variables. The problem is that i have to delete all observations of
these country pairs for all years if only one or more variables have a
missing or if it is one or more years short. I ran into a dead end
trying to figure out how to imply the rest of the observations of a
"faulty" country pair into the drop command.

I hope that was enough information. Help would be much appreciated

Regards

Matei


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: Re: st: Strange -robust- results with a dummy variable
Next by Date: RE: st: RE: Unbalancing a panel data set with country pairs
Previous by thread: RE: st: RE: Unbalancing a panel data set with country pairs
Next by thread: st: Inverse Mills Ratio after xtprobit
Index(es):
- Date
- Thread