[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: from long to wide

From   Ning Li <>
Subject   Re: st: from long to wide
Date   Thu, 01 Nov 2007 12:29:36 +1100

Garry, Marten, Nick, Austin and Bill,

Thanks very much for your responses to my question.

1) I explained to and requested RADL several times

2) STATA commands are very useful, any restriction on it causes huge inconvenience

3) I will try Austin's codes and take Bill's advice



At 01:40 AM 11/1/2007, you wrote:

Ning <> asked,

> How to reshape a data set from long form to wide form without using the
> command reshape?

and went on to explain why,

> Recently I am working on a data set from a remote access data library. The
> RADL receives STATA codes submitted by its users, it runs jobs and returns
> outputs via its web interface. For confidentiality reasons, not all the
> STATA commands are allowed in the RADL. One of these commands is reshape, we
> know reshape a dataset won't reveal extra information though. I tried to
> program but failed.)

Nick Cox <> responded rather negatively to Ning's
question because, as he said, "-reshape- is no threat to anyone." I agree.
Nonetheless, if Ning is getting data from a remote library and they have
turned off -reshape-, what is Ning to do?

> A 1 a
> A 2 b
> A 2 c
> B 1 b
> B 2 a
> where HID and PID represent household and personal ids respectively, and
> INJDAMC the damage type (eg, a=burns, b=bruising, c=..)in a person's most
> recent injury.

Here's how to convert this wide data to long form:

// setup
. use original, clear
. sort HID PID
. by HID PID: assert _N==1 // checking assumptions
. save original, replace // save sorted by HID

// do PID==1
. use original, clear
. keep if PID==1
. drop PID
. save long, replace

// do PID==2
. use original, clear
. keep if PID==2
. drop PID
. merge HID using long
. drop _merge
. sort HID
. save long, replace

If there was a PID==3, one would continue

// do PID==3
. use original, clear
. keep if PID==3
. drop PID
. merge HID using long
. drop _merge
. sort HID
. save long, replace

An outline of the process is,

1. Check assumptions. HID PID should uniquely identify the
observations. In addition, resave original.dta sorted by
at least HID, because we are going to need it that way.

2. Determine the values of PID. Interactively, the easy way to
that is -tabulate PID-. Write down all the values that PID
takes on.

3. Do the following for each value of PID:

a. Use original.dta and keep just the observations for the
specific PID being processed right now.
Then drop variable PID.

b. rename all the other variables except HID to have a
suffix indicating the PID value. In the example, we
had one such variable: INJDAM.

c. If this is the first PID value, save the beginnings
of the dataset in long form: -save long-. For subsequent
PID values, you must merge by HID the data in memory with
long.dta, drop _merge, re-sort by HID, and resave long.dta.

I admit it's a lot of work.

For everyone's information, Stata was the first package to have a
-reshape- command, and the terms reshape, long, and wide were
introduced by us. In my past as an analyst, I had reshaped data
by hand many times and it was always a tedious experience. We were
very proud the day we realized that Stata's programming language was
up to mechanizing the task because many programming languages are not.

-- Bill
* For searches and help try:
*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index