Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Data Quality check of unbalanced panel data


From   Nick Cox <[email protected]>
To   "[email protected]" <[email protected]>
Subject   Re: st: Data Quality check of unbalanced panel data
Date   Mon, 20 Jan 2014 16:45:23 +0000

If you seek a p-value, you need an explicit model of the generating
process. No free lunch here.

If your aim is just checking the quality, nothing beats a graphical
exploration.

Nick
[email protected]



On 20 January 2014 16:23, SIYAM, Amani <[email protected]> wrote:
> Dear Stata-Listers,
>
> I have a panel of 24 years (1990-2013) of a continuous variable Q measured in X countries. For each country, the measurement of Q comes from 1 or more data sources to fill the panel of years and many countries have unbalanced panels - example shown below.
>
> I wish to diagnose the chance of an outlier/ odd value (which could be due to data source variability, or a data entry error) before proceeding with my analysis.
>
> To measure the average change overtime I calculated at each year the average exponential growth rate (AEGR)=ln(Q n - Q n-1) / (t n-tn-1) for all t >1990
>
> I also calculated for each country AEGR_ALL for the total years contributed (e.g. in the example below ln(Q 2010 - Q 1990) / 20 years)
>
>       +----------------------------------------+
>        | year              Q         AEGR   AEGR_ALL |
>        |----------------------------------------|
>   1. | 1990       .539           .            .0409264 |
>   2. | 1991       .538    -.001857   .0409264 |
>   3. | 1992       .598    .1057322   .0409264 |
>   4. | 1993       .606    .0132893   .0409264 |
>   5. | 1994       .606           0           .0409264 |
>        |----------------------------------------|
>   6. | 1995       .666    .0944097   .0409264 |
>   7. | 1996       .681    .0222726   .0409264 |
>   8. | 1997       .703    .0317946   .0409264 |
>   9. | 1998       .733    .0417888   .0409264 |
> 10. | 1999        .76    .0361727   .0409264 |
>        |----------------------------------------|
> 11. | 2000       .782    .0285363   .0409264 |
> 12. | 2001       .807    .0314689   .0409264 |
> 13. | 2002       .819    .0147604   .0409264 |
> 14. | 2003       .833    .0169496   .0409264 |
> 15. | 2004      1.341    .4761372   .0409264 |
>        |----------------------------------------|
> 16. | 2005       .933   -.3627656   .0409264 |
> 17. | 2007      1.023    .0460448   .0409264 |
> 18. | 2008       1.16    .1256805   .0409264 |
> 19. | 2009       1.19    .0255334   .0409264 |
> 20. | 2010      1.222    .0265355   .0409264 |
>        +----------------------------------------+
>
> I am now stuck on how to find and "best-classify" the oddities....for example I am suspecting an outlier Q-value in the year "2004" (AEGR is 10 times AEGR_ALL).
>
> Is there a way I can test that using the stats calculated (AEGR and AEGR_ALL) or are there better approaches to follow in quality-checking unbalanced panel data.
>
> With all my thanks in advance.
>
> Amani
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index