Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Reshape: from wide to long


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: Reshape: from wide to long
Date   Tue, 16 Sep 2003 10:28:39 +0100

Christer.Thrane@hil.no
>
> I have a panel data set (in wide format) where the same
> individuals were
> interviewed eight times during a ten-year period. During
> the ten-year
> period, about 30% of the orginal sample was lost due to attrition.
>
> Some variables were measured 2 times, some were measured 3
> ... and some
> were measured 5 times. That is, my data looks like:
>
> y1 y2 y3 y4 x1 x2 z1 z2 z3 v1 v2 v3 v4...
>
> First, if I understand the manual correctly (which I
> probably don't), the
> reshape command treats x3 above as a missing observation.
> The manual says
> (p. 393): "Missing variables are treated like variables with missing
> values". As I understand it, however, x3 is not missing (as
> in attrition) -
> it was just not included in the survey that particular year.
>
> Should this concern me, or is it just a technicality that I
> can overlook?
> More important, will reshape "solve" this case in point?

First, you are superimposing your own thinking about the
data and what they mean here. That's most sensible for
you, but, from the list above, -reshape- is aware only that
there are no variables -x3- -x4- -z4-. Whether they are logically
impossible, or the people dropped out, or they were there but you
never
interviewed them, or someone destroyed the data by accident,
or in a fit of pique: all that is your issue, not Stata's.

Stata has a predisposition to rectangular data structures,
so on something like

. reshape long y x z v, i(id)

missing observations will indeed spring into existence for
"times" 3 and 4 for -x-, and so forth. What you do with them
is up to you. By and large, they do no harm and Stata
commands will almost always do what you want given them.
(The main exception is if you forget that -if x > 5-
includes missing x.) On the whole, I'd leave them in.

I can't say whether this solves your problem, as I'm not
clear what the problem is.

>
> Second, must the variables be placed adjacant as above for
> reshape to work,
> or can they be organized as:
>
> y1 x1 z1 v1 y2 x2.... v4
>

-reshape- appears indifferent to variable order. If your
data are like this it might help you to tidy them up,
but again that's your concern.

By the way, I find -reshape- a daunting command at times
and it's not always easy for me to think through what
-reshape- will do in problematic cases. Making up
a little data set and playing with it to find out is an obvious
tip, but perhaps one worth mentioning.
I use small random integers -- it is easy to trace
mappings with such data -- so to answer the question above
I did something like this:

set obs 10
foreach v in y1 y2 y3 y4 x1 x2 z1 z2 z3 v1 v2 v3 v4 {
	egen `v' = rndint(), max(10)
}
d
l
gen id = _n
reshape long y x z v, i(id)

Here -rndint()- comes from -egenmore- on SSC.

Nick
n.j.cox@durham.ac.uk

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2021 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index