Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: new variables creation with unbalanced panel

From   Laura <[email protected]>
To   [email protected]
Subject   Re: st: new variables creation with unbalanced panel
Date   Wed, 13 Oct 2010 08:56:43 -0700 (PDT)

Mr. Kohler and Mr. Booth,

Thank you so much for the replies. I have tried the solution suggested by Ulrich 

Kohler, however I could not make it work yet, which could be, of course, due to 
my lack of experience. I appreciate the suggestions for further reading a lot.

I did not try the second suggestion from Eric Booth yet. It will take me a 
little time to understand the code. Thank you so much for the time put into 

In my initial email I tried to simplify the problem a little. Here is a more 
detailed explanation. My panel contains poverty indicators (var x) which as 
are not available every year and the gaps have various lenghts, among 
countries and across time. I also have other macro variable which are more or 
less available yearly (var y).

I need to create new variables based on var x,
for example, yearly growth, between two consecutive available x's, 
within a country: 

growthx(t)=(log x(t)-log x(t-i))/i , where t-i and t are two consecutive years 
for which var x is available. 

Also, based on x, for year t, I need  the variable initialx(t)=x(t-i), within a 

Based on y, for year t I need the initial value of y which is the value of y 
in the previous year when x was observed


and the average value of y for all the years between t-i and t. 

avgy(t)=mean (y(t-i), y(t-i+1)..., y(t))

My simple minded approach is to create a variable that stores the positions 
where x is observed, for each country.

tsset id year
sort id year
by id (year), sort: gen pos=_n if mean<.

Then for each nonmissing value of pos, determine the position of the previous 
non-missing pos. However, I am not familiar enough with the STATA syntax and I 
can't make this work. It would be something like this (only it would need to 

by id (year), sort: gen previous=max(range position 1 position-1) if position<. 

(this statement would have to find the maximum value of pos in the range between 

the first observation of pos and the (current-1) observation,  for each 

If this can be accomplished, then I could reffer to values of x and y at 
position previous or in the range (previous position)

I don't know if this is actually feasible in STATA or not.

Again, many, many thanks for the help.



*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index