Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: new variables creation with unbalanced panel

From   Nick Cox <[email protected]>
To   "'[email protected]'" <[email protected]>
Subject   RE: st: new variables creation with unbalanced panel
Date   Wed, 13 Oct 2010 17:36:09 +0100

I note that 100+1-1 = 100 and 10+1-1 = 10, so that's a minute simplification. 

More interestingly, as the aim is to round up, then 

ceil(100 * runiform()) 

is a smidgen more elegant than 

1 + int(100 * runiform()) 

These details are incidental to the original problem, but this kind of code segment is common enough to attract comment. 

[email protected] 

Eric Booth

The biggest challenge in Laura's questions are that, in creating the avg_y and diff_x, there is overlap in use of the observations of x.
So,  for observation 5 (for country A, year 5), she wants the difference between A5 and A2 after calculating the difference between A2 and A1 in obs 2.  I found it difficult to allow for this overlap when working down the columns with egen and subscripting.  (If Laura did not need this overlap, you could just create an panel indicator for A1 and A2 and gen the difference & average, and then create an indicator for A3 to A5 and gen the diff and avg, and so on.)  I have no idea about the wisdom of this approach, but it is what Laura has requested in her post.

I think Ulrich Kohler's solution doesn't take into account this overlap, nor will it work when there are a several missing x in a row (as is the case in Laura's first example (and it's still indicated her latest email that "variable x has many missing values while y is available for all years.")

I finally got this to give the output that Laura describes, but the code is not straightforward at all (particularly the answer to question (2) in Laura's email).  To complicate things, I tend to be overly redundant when writing subscripting conditions, so some of the double checking if something is missing or not could probably be removed.   Hopefully other's will have suggestions on how to do this with less effort, but this does produce what Laura describes:

**NOTE: watch for wrapping issues**

set obs 20 

//create fake dataset//
g str5 country  = "A" in 1/5
replace country = "B" in 6/18
replace country = "C" in 19/20
variable x has many missing values while 
y is available for all  years.
bys country: g year = _n
g x = 1+int((10-1+1)*runiform())
	**create some missings in x**
	replace x = . in 2/4 
	replace x = . in 7/8
	replace x = . in 12/13
g y = 1+int((100-1+1)*runiform())

g initial_x =.
g diff_x = .

**I created these vars in the 
**opposite order than you described them

qui su year
forval n = 1/`r(max)' {

3) a variable that for each country, 
for each available x takes the value of the 
previous observed x.

bys country: replace initial_x = x[_n-`n']  ///
	if !mi(x[_n-`n']) & mi(initial_x[_n]) &  ///

1) a variable that for each country 
will equal the difference between two 
consecutive available x values.
bys country: replace diff_x =  ///
	x[_n] - x[_n-`n'] if !mi(x[_n]) ///
	& mi(diff_x[_n]) 

2) a variable that for each country 
will take the value the average of all y's 
available between  two consecutive 
available x values.
g avgy = .

egen pid = group(country year x) if !mi(x)
	**fillin missing**
qui su year
forval n = 1/`r(max)' {
bys country: replace pid = pid[_n-`n'] ///
	if !mi(pid[_n-`n']) & mi(pid[_n])

levelsof pid, local(panels)
token `"`panels'"'
while "`1'" != "" & "`2'"!= "" {
	di "var avgy_p`1' for `1':`2'"
cap drop i`1'
cap drop o
g o = 1 if pid[_n] == pid[_n+1] ///
	& mi(x[_n+1])
egen avgy_p`1' = mean(y) ///
 	if pid==`1' | pid==`2' ///
 	& o !=1
replace avgy_p`1' = . if pid!=`2'
g i`1' = 1 if country[_n]!=country[_n-1] ///
	& !mi(avgy_p`1')
qui su i`1'
if "`r(max)'"=="1"  drop avgy_p`1' 
cap drop i`1'
cap replace avgy = avgy_p`1'  ///
	if mi(avgy) & !mi(avgy_p`1') ///
	& !mi(x)
cap drop avgy_p`1'
cap drop o
 		mac shift
drop pid

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index