Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: Data Quality check of unbalanced panel data
From
"SIYAM, Amani" <[email protected]>
To
"[email protected]" <[email protected]>
Subject
st: Data Quality check of unbalanced panel data
Date
Mon, 20 Jan 2014 16:23:32 +0000
Dear Stata-Listers,
I have a panel of 24 years (1990-2013) of a continuous variable Q measured in X countries. For each country, the measurement of Q comes from 1 or more data sources to fill the panel of years and many countries have unbalanced panels - example shown below.
I wish to diagnose the chance of an outlier/ odd value (which could be due to data source variability, or a data entry error) before proceeding with my analysis.
To measure the average change overtime I calculated at each year the average exponential growth rate (AEGR)=ln(Q n - Q n-1) / (t n-tn-1) for all t >1990
I also calculated for each country AEGR_ALL for the total years contributed (e.g. in the example below ln(Q 2010 - Q 1990) / 20 years)
+----------------------------------------+
| year Q AEGR AEGR_ALL |
|----------------------------------------|
1. | 1990 .539 . .0409264 |
2. | 1991 .538 -.001857 .0409264 |
3. | 1992 .598 .1057322 .0409264 |
4. | 1993 .606 .0132893 .0409264 |
5. | 1994 .606 0 .0409264 |
|----------------------------------------|
6. | 1995 .666 .0944097 .0409264 |
7. | 1996 .681 .0222726 .0409264 |
8. | 1997 .703 .0317946 .0409264 |
9. | 1998 .733 .0417888 .0409264 |
10. | 1999 .76 .0361727 .0409264 |
|----------------------------------------|
11. | 2000 .782 .0285363 .0409264 |
12. | 2001 .807 .0314689 .0409264 |
13. | 2002 .819 .0147604 .0409264 |
14. | 2003 .833 .0169496 .0409264 |
15. | 2004 1.341 .4761372 .0409264 |
|----------------------------------------|
16. | 2005 .933 -.3627656 .0409264 |
17. | 2007 1.023 .0460448 .0409264 |
18. | 2008 1.16 .1256805 .0409264 |
19. | 2009 1.19 .0255334 .0409264 |
20. | 2010 1.222 .0265355 .0409264 |
+----------------------------------------+
I am now stuck on how to find and "best-classify" the oddities....for example I am suspecting an outlier Q-value in the year "2004" (AEGR is 10 times AEGR_ALL).
Is there a way I can test that using the stats calculated (AEGR and AEGR_ALL) or are there better approaches to follow in quality-checking unbalanced panel data.
With all my thanks in advance.
Amani
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/