Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: How to remove cross-sections with high number of missing values in panel data analysis


From   Eric Booth <ebooth@ppri.tamu.edu>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: How to remove cross-sections with high number of missing values in panel data analysis
Date   Sat, 20 Feb 2010 00:11:48 -0600

>

> Is there any quick way to convert the value of variable in
> consideration to missing in all those cross-sections with insufficient
> data.


Instead of converting them to missing or dropping them, you could just ignore 
them with an "if" statement after creating an indicator that flags them if they have
too much missingness:

*------------------------BEGIN EXAMPLE
clear
inp case year v1
1	2000	80
1	2001	350
1	2002	2285
1	2003	2402
1	2004	480
1	2005	2135
1	2006	1862
1	2007	230
1	2008	1302
2	2000	118
2	2001	2427
2	2002	825
2	2003	326
2	2004	1111
3	2000	333
3	2001	853
3	2002	1294
3	2003	1137
3	2004	1011
3	2005	31
3	2006	750
3	2007	408
3	2008	1369
3	2009	198
3	2010	1476
3	2011	1609
3	2012	783
end

fillin case year
drop _f
*****
bys case: sum year


*ignore if there aren't at least 6 cases*
bys case: egen ignore = count(v1)


tab v1 case if ignore>6 & !mi(ignore)
mean v1 if ignore>6 & !mi(ignore)
**"ignore" shouldn't be missing, but just in case

**now, your analysis here**

*------------------------END EXAMPLE

if you really want to covert "var1" to missing when there are less than 
a certain number of cases, you could type:

replace var1 = . if ignore<=6



~ Eric
__
Eric A. Booth
Public Policy Research Institute
Texas A&M University
ebooth@ppri.tamu.edu
Office: +979.845.6754
Fax: +979.845.0249
http://ppri.tamu.edu



On Feb 19, 2010, at 11:46 PM, Prabhat wrote:

> Dear members,
> 
> I am new to STATA.
> 
> While analyzing a panel with 23 crosssections and 30 years, I am
> getting abnormal results in some cases thanks to very few number of
> observations (less than 5)  in each cross sections.
> 
> Is there any quick way to convert the value of variable in
> consideration to missing in all those cross-sections with insufficient
> data.
> 
> In summary,
> 
> I have
> 3 observations of y for ID=25
> 2 observations of y for ID=58
> 
> and so on
> 
> where y can have up to 30 observations for each cross-section i.e. each ID.
> 
> I need to set up some rule, which automatically discards one
> cross-section if number of missing values is very high.
> 
> Any comment will be appreciated.
> 
> Thank you.
> 
> Regrads,
> Prabhat
> International University of Japan
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/




*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index