Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: from long to wide


From   wgould@stata.com (William Gould, StataCorp LP)
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: from long to wide
Date   Wed, 31 Oct 2007 09:40:22 -0500

Ning <N.Li@ms.unimelb.edu.au> asked, 

> How to reshape a data set from long form to wide form without using the
> command reshape?

and went on to explain why, 

> Recently I am working on a data set from a remote access data library.  The
> RADL receives STATA codes submitted by its users, it runs jobs and returns
> outputs via its web interface.  For confidentiality reasons, not all the
> STATA commands are allowed in the RADL. One of these commands is reshape, we
> know reshape a dataset won't reveal extra information though.  I tried to
> program but failed.)

Nick Cox <n.j.cox@durham.ac.uk> responded rather negatively to Ning's
question because, as he said, "-reshape- is no threat to anyone."  I agree.
Nonetheless, if Ning is getting data from a remote library and they have 
turned off -reshape-, what is Ning to do?

>         HID       PID      INJDAM
>           A         1           a
>           A         2           b
>           A         2           c
>           B         1           b
>           B         2           a
>
> where HID and PID represent household and personal ids respectively, and
> INJDAMC the damage type (eg, a=burns, b=bruising, c=..)in a person's most
> recent injury.

Here's how to convert this wide data to long form:

                                         // setup 
        . use original, clear 
        . sort HID PID
        . by HID PID: assert _N==1       // checking assumptions
        . save original, replace         // save sorted by HID

                                         // do PID==1
        . use original, clear 
        . keep if PID==1
        . drop PID
        . rename INJDAM INJDAM1
        . save long, replace 

                                         // do PID==2
        . use original, clear 
        . keep if PID==2
        . drop PID
        . rename INJDAM INJDAM2
        . merge HID using long
        . drop _merge 
        . sort HID
        . save long, replace 

If there was a PID==3, one would continue

                                         // do PID==3
        . use original, clear 
        . keep if PID==3
        . drop PID
        . rename INJDAM INJDAM3
        . merge HID using long
        . drop _merge 
        . sort HID
        . save long, replace 

An outline of the process is, 

    1.  Check assumptions.  HID PID should uniquely identify the 
        observations.  In addition, resave original.dta sorted by 
        at least HID, because we are going to need it that way.

    2.  Determine the values of PID.  Interactively, the easy way to 
        that is -tabulate PID-.  Write down all the values that PID 
        takes on.

    3.  Do the following for each value of PID:

            a.  Use original.dta and keep just the observations for the 
                specific PID being processed right now.  
                Then drop variable PID.

            b.  rename all the other variables except HID to have a 
                suffix indicating the PID value.  In the example, we 
                had one such variable:  INJDAM.

            c.  If this is the first PID value, save the beginnings 
                of the dataset in long form:  -save long-.  For subsequent
                PID values, you must merge by HID the data in memory with 
                long.dta, drop _merge, re-sort by HID, and resave long.dta.

I admit it's a lot of work.  

For everyone's information, Stata was the first package to have a 
-reshape- command, and the terms reshape, long, and wide were 
introduced by us.  In my past as an analyst, I had reshaped data 
by hand many times and it was always a tedious experience.  We were 
very proud the day we realized that Stata's programming language was 
up to mechanizing the task because many programming languages are not.

-- Bill
wgould@stata.com
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index