Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: loop to fill in missing observations


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   st: RE: loop to fill in missing observations
Date   Fri, 15 Jun 2007 17:47:26 +0100

"Missing" in Stata circles should not be used to mean "omitted" or 
"not present". "Missing" means missing values on (existing) variables. 

Anyway, each country and year should be 
represented at least once. That can done within place or by a -merge- 
with a suitable file. Merge mavens can work out the latter. 

I see that Fabrizio has set up up a variable -fillin- with marks 
at places he wants to fill in. I am going to ignore this. I am
going to assume a clean dataset with just the data. 

Here is an in-place solution.  

qui forval x = 1/28 { 
	forval y = 1975/2002 { 
		count if countryn == `x' & year == `y' 
		if r(N) == 0 { 
			set obs `= _N + 1' 
			replace countryn == `x' in l 
			replace year == `y' in l 
		}
	}
} 			

In words, 

for each country { 
	for each year { 
		count how many obs for that country and year 
		if there aren't any { 
			add an extra observation 
			set that observation to the right values 
		} 
	}
}

The difference between using -set obs- to add an extra observation
and -expand 2 in l- is that in the former all new values
are born missing. 

Nick 
[email protected] 

Fabrizio Gilardi
 
> I have a dataset of national elections in 28 countries. Observations  
> are elections. This means that there can be several elections in the  
> same year, and on the other hand only years when an election 
> was held  
> are included in the dataset.
> 
> I want to fill in missing years for each country. My idea was 
> to loop  
> over countries and years to check for every country if a given year  
> is present, and if not fill it in. To do so, I have created an  
> appropriate number of missing observations to be filled in, and a  
> counter variable to identify them.
> 
> Concretely, the dataset looks like this:
> 
> countryn		year	 elecn	fillin
> 1			1990	1		.
> 1			1994	2		.
> 1			1994	3		.
> 1			1997	4		.
> 2			1989	1		.
> 2			1992	2		.
> 2			1995	3		.
> 2			1999	4		.
> 2			2000	5		.
> .			.		.		1
> .			.		.		2
> .			.		.		3
> 
> 
> And my code is the following:
> 
> g n=.
> local z=1
> qui forval x=1/28 {
> 	forval y=1975/2002 {
> 		sum year if year==`y' & countryn==`x'
> 		replace n=r(N)
> 		replace year=`y' if n==0 & fillin==`z'
> 		replace countryn=`x' if n==0 & fillin==`z'
> 		count if countryn==`x' & year==`y' & fillin==`z'
> 		local w=r(N)
> 		local z=`z'+`w'
> 	}
> }
> 
> Now, for some countries it works fine, but for most some missing  
> years are not filled in. It does not seem to depend on 
> whether in the  
> country there was more than one election in some year, and I cannot  
> find any pattern that could help me identify the problem.
> 
> What am I doing wrong?

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index