Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: rehaping and merging datasets

From   "Tae Ho Eom" <>
To   <>
Subject   st: rehaping and merging datasets
Date   Mon, 21 Jul 2003 23:58:13 -0400

Dear "statalist" members,


I am working on two separate data manipulation works, facing some difficulties.


If you could advise me of the way I can process the dataset, it would be highly appreciated.


The first data manipulation work is:


I have 20 datasets (the same dataset over 20 years) with the same variables and structure as follows.

(the first line is variable names and some observations are below)




    01         178            GG         100,000

    01         166            SW         200,000

    01         778            DL            50,000

    03         336            GG           86,000

    03         227            SW           33,000


; This is a dataset that explains how the federal government money is distributed to each State and County by Object (such as Salary, Insurance)

OBJECTS has about 9 categories and I have code for OBJECTS (e.g. GG mean federal grants)

I have 20 years dataset, so I will generate “year” variable for each dataset.



The final dataset I want is:


YEAR    STATE          GG                     SW               DL             (and other OBJECT ITEMs below) 

 1997         01        $1000,000          $2000,000    $340,000

 1997         02        $2000,000          $3000,000    $345,000


 1998         01        $3000,000          $2400,000    $345,000

 1998         02        $5000,000          $3400,000    $367,000





In short, I want to have the dataset that summarizes how much money is distributed based on OBJECTS categories by each STATE; in other words, I have to make each OBJECTS category a “variable” and want to collapse the dataset by STATE.


I don’t have problem with appending the 20 datasets, but I think I need some “foreach” and “local macro” commands that perform the data manipulation work before appending the whole datasets.


Please advice me of the best way I can do.



The Second Work is:



I have two kinds of datasets: one is assessment value of houses and the second is property tax exemptions


Dataset type 1 (House Price Assessment Data)


MUNICODE            ID              ASMT              SCH_CODE

10001000               1              $1000,000          1001

10001000               2              $2000,000          1002


10002020               1              $1000,000          1003




MUNICODE  : code for municipality (town of Manlius)

ID                 : house ID (start from 1 and repeats in each municipality from 1 again)

ASMT           : assessment value of house for each ID

SCH_CODE : school district codes (generally school districts are not matching with municipalities; 

                       one municipality can include multiple school districts)



Dataset type 2 (Property Tax Exemption Data)


MUNICODE            ID       EX_CODE  EX_NBR    EX_AMT       

10001000               1          1200            1            $200,000          

10001000               1          2100            2            $300,000          

10001000               1          3100            3            $200,000          

10001000               2          3100            1            $200,000          




10002020               1          1200            1            $100,000          




MUNICODE and ID are the same as in the dataset type 1

EX_CODE          : exemption code (e.g. 1200 means exemptions for senior owners)

EX_NBR            : sequence number of exemptions applied to each ID (multiple exemptions can be applied to one property)

EX_AMT            : exemption amount per each exemption code


(In sum, this exemption datasets don’t include school district code but municipality code and IDs so that I can merge the two files by MUNICODE and ID.

 But since house owner can have multiple exemptions I have to rearrange this dataset before merging) 


The Final Dataset I want



MUNICODE            ID              ASMT             EX_CODE (1200)   EX_CODE2(2100)  EX_CODE3(3100)                     SCH_CODE

10001000               1              $1000,000            $200,000               $300,000                   $200,000                                1001

10001000               2              $2000,000                       -                            -                    $200,000                                1002


10002020               1              $1000,000            $100,000                           -                               -                                 1003




In the final dataset, I want to merge the two types of dataset and make the EX_AMT (amount of exemption) for each EX_CODE (exemption code) variables.

I have been studying many command such as “foreach” “reshape” but cannot figure out how to do this (I also have at least basic concepts for macros)


This is really important step for my dissertation. Please advice me of the best way.












*   For searches and help try:

© Copyright 1996–2022 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index