Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Data Quality check of unbalanced panel data


From   "SIYAM, Amani" <[email protected]>
To   "[email protected]" <[email protected]>
Subject   st: Data Quality check of unbalanced panel data
Date   Mon, 20 Jan 2014 16:23:32 +0000

Dear Stata-Listers,
 
I have a panel of 24 years (1990-2013) of a continuous variable Q measured in X countries. For each country, the measurement of Q comes from 1 or more data sources to fill the panel of years and many countries have unbalanced panels - example shown below.
 
I wish to diagnose the chance of an outlier/ odd value (which could be due to data source variability, or a data entry error) before proceeding with my analysis.
 
To measure the average change overtime I calculated at each year the average exponential growth rate (AEGR)=ln(Q n - Q n-1) / (t n-tn-1) for all t >1990
 
I also calculated for each country AEGR_ALL for the total years contributed (e.g. in the example below ln(Q 2010 - Q 1990) / 20 years)
 
      +----------------------------------------+
       | year              Q         AEGR   AEGR_ALL |
       |----------------------------------------|
  1. | 1990       .539           .            .0409264 |
  2. | 1991       .538    -.001857   .0409264 |
  3. | 1992       .598    .1057322   .0409264 |
  4. | 1993       .606    .0132893   .0409264 |
  5. | 1994       .606           0           .0409264 |
       |----------------------------------------|
  6. | 1995       .666    .0944097   .0409264 |
  7. | 1996       .681    .0222726   .0409264 |
  8. | 1997       .703    .0317946   .0409264 |
  9. | 1998       .733    .0417888   .0409264 |
10. | 1999        .76    .0361727   .0409264 |
       |----------------------------------------|
11. | 2000       .782    .0285363   .0409264 |
12. | 2001       .807    .0314689   .0409264 |
13. | 2002       .819    .0147604   .0409264 |
14. | 2003       .833    .0169496   .0409264 |
15. | 2004      1.341    .4761372   .0409264 |
       |----------------------------------------|
16. | 2005       .933   -.3627656   .0409264 |
17. | 2007      1.023    .0460448   .0409264 |
18. | 2008       1.16    .1256805   .0409264 |
19. | 2009       1.19    .0255334   .0409264 |
20. | 2010      1.222    .0265355   .0409264 |
       +----------------------------------------+
 
I am now stuck on how to find and "best-classify" the oddities....for example I am suspecting an outlier Q-value in the year "2004" (AEGR is 10 times AEGR_ALL).
 
Is there a way I can test that using the stats calculated (AEGR and AEGR_ALL) or are there better approaches to follow in quality-checking unbalanced panel data.
 
With all my thanks in advance.
 
Amani
 

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index