Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Creating long, filledin dataset from two, year variables
From
Nick Cox <[email protected]>
To
[email protected]
Subject
Re: st: Creating long, filledin dataset from two, year variables
Date
Sun, 6 Mar 2011 09:03:47 +0000
I think you guessed wrong. This is just a wide structure.
I don't know how you are going in fill in populations after the first,
but this is a start.
You could -reshape long-.
clear
input id pop startyear endyear
1 11000 1818 1822
2 1500 1820 1824
3 15000 1820 1823
4 2200 1821 1836
5 2000 1821 1840
6 125000 1821 1828
end
gen nyears = endyear - startyear + 1
rename startyear year1
rename endyear year2
reshape long year, i(id)
replace pop =. if _j == 2
expand nyears if _j == 2
bysort id (_j) : replace year = year[_n-1] + 1 if _j > 1
drop nyears _j
Here is some reading:.
help for -reshape-, -expand-.
FAQ . . . . . . . . . . . . . . . . . . . . . . . . Problems with reshape
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
12/03 I am having problems with the reshape command. Can
you give further guidance?
http://www.stata.com/support/faqs/data/reshape3.html
FAQ . . . . . . . . . . . . . . . . . . . . . . . Replacing missing values
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
2/03 How can I replace missing values with previous or
following nonmissing values?
http://www.stata.com/support/faqs/data/missing.html
clear
input id pop startyear endyear
1 11000 1818 1822
2 1500 1820 1824
3 15000 1820 1823
4 2200 1821 1836
5 2000 1821 1840
6 125000 1821 1828
end
gen nyears = endyear - startyear + 1
rename startyear year1
rename endyear year2
reshape long year, i(id)
replace pop =. if _j == 2
expand nyears if _j == 2
bysort id (_j) : replace year = year[_n-1] + 1 if _j > 1
drop nyears _j
I gave the -reshape- solution because it is always worth knowing about
-reshape-. But there is a more direct solution too:
clear
input id pop startyear endyear
1 11000 1818 1822
2 1500 1820 1824
3 15000 1820 1823
4 2200 1821 1836
5 2000 1821 1840
6 125000 1821 1828
end
gen nyears = endyear - startyear + 1
expand nyears
gen year = startyear
bysort id : replace year = year[_n-1] + 1 if _n > 1
replace pop =. if year != startyear
drop nyears startyear endyear
On Sun, Mar 6, 2011 at 4:33 AM, Kevin O'Connell <[email protected]> wrote:
> I am trying to move a dataset that was built with a start year, start
> year population and end year to having a long format. I guessed this
> was a double-wide dataset, but I couldnt get my variables to match up,
> or to fillin .
>
> id pop startyear endyear
> 1 11000 1818 1822
> 2 1500 1820 1824
> 3 15000 1820 1823
> 4 2200 1821 1836
> 5 2000 1821 1840
> 6 125000 1821 1828
>
> I am trying to get the dataset into this format so I can fill in
> missing variables for population over the time span between start and
> end:
>
> id year pop
> 1 1818 11000
> 1 1819
> 1 1820
> 1 1821
> 1 1822
> 2 1820 1500
> 2 1821
> 2 1822
> 2 1823
> 2 1824
>
> and so on.
> There are about 5000 years within 500 id, so I am hoping to find a
> better solution that data entry, but i dont know the right
> term/command for what I am trying to do.
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/