Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: rehaping and merging datasets


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: rehaping and merging datasets
Date   Tue, 22 Jul 2003 10:06:35 +0100

Tae Ho Eom has several tasks on which assistance
is desired. I'll pick off one: 
 
> I am working on two separate data manipulation works, 
> facing some difficulties.
> If you could advise me of the way I can process the 
> dataset, it would be highly appreciated.
> 
> The first data manipulation work is:
> 
> I have 20 datasets (the same dataset over 20 years) with 
> the same variables and structure as follows.
> (the first line is variable names and some observations are below)
> 
> STATE   COUNTY   OBJECTS   AMOUNTS   
> 
>     01         178            GG         100,000
>     01         166            SW         200,000
>     01         778            DL            50,000
>     03         336            GG           86,000
>     03         227            SW           33,000
> 
> ; This is a dataset that explains how the federal 
> government money is distributed to each State and County by 
> Object (such as Salary, Insurance)
> OBJECTS has about 9 categories and I have code for OBJECTS 
> (e.g. GG mean federal grants)
> I have 20 years dataset, so I will generate year variable 
> for each dataset.
> 
> The final dataset I want is:
> 
> YEAR    STATE          GG                     SW            
>    DL             (and other OBJECT ITEMs below) 
> 
>  1997         01        $1000,000          $2000,000    $340,000
>  1997         02        $2000,000          $3000,000    $345,000
>  1998         01        $3000,000          $2400,000    $345,000
>  1998         02        $5000,000          $3400,000    $367,000

> In short, I want to have the dataset that summarizes how 
> much money is distributed based on OBJECTS categories by 
> each STATE; in other words, I have to make each OBJECTS 
> category a variable and want to collapse the dataset by STATE.
> 
> I dont have problem with appending the 20 datasets, but I 
> think I need some foreach and local macro commands that 
> perform the data manipulation work before appending the 
> whole datasets.

If you do the manipulation before you -append- you have to do 
it several times; if afterwards just once. It's possible, of 
course, that memory is an issue, but let's be optimistic. 

I assume data sets data1776-data1795 for years 1776-1795 
with observations like 

 state county objects amounts 
     01         178            GG         100,000
     01         166            SW         200,000
     01         778            DL            50,000
     03         336            GG           86,000
     03         227            SW           33,000

To -append-, we read in the first and -append- 
the rest one at a time. Also, -replace- the year values in a loop: 
that beats reading in each data set, -generate-ing a variable 
and then writing the set out again. 

. use data1776 
. gen year = 1776 
. forval y = 1777/1795 { 
. 	append using data`y' 
. 	replace year = `y' if mi(`y') 
. } 

Now we are feeling ready to -collapse-: 

. collapse (sum) amount, by(year objects state) 

The -reshape- is fairly standard: 

. reshape wide amount, i(year state) j(objects) string

The clean-up is (1) map missings to 0: 
(2) to fix the variable names: 

. mvencode amount*, mv(0) 
. renpfix amount 

Nick 
n.j.cox@durham.ac.uk 


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index