Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Arranging variables across rows


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Arranging variables across rows
Date   Tue, 26 Jun 2012 23:30:59 +0100

-rowsort- is from SJ. I think you are correct; it does not help here
at all, nor does it purport to.

My main advice is to restructure to a long structure as fast as
possible. With this structure this will only be the first of several
awkward problems and not even the most difficult of those.

I created some similar data and did this

. list

     +-------------------------------------------------+
     |  A1    A2    A3    A4    B1    B2   B3   family |
     |-------------------------------------------------|
  1. | 101   102   103   104   102   104    .    alpha |
  2. | 201   202   203   204   203     .    .     beta |
     +-------------------------------------------------+

. keep family A*

. reshape long A, i(family)
(note: j = 1 2 3 4)

Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                        2   ->       8
Number of variables                   5   ->       3
j variable (4 values)                     ->   _j
xij variables:
                           A1 A2 ... A4   ->   A
-----------------------------------------------------------------------------

. drop if _j == .
(0 observations deleted)

. drop _j

. gen treated = 0

. rename A person

. save Afile, replace
file Afile.dta saved

. use rowprob

. keep family B*

. reshape long B, i(family)
(note: j = 1 2 3)

Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                        2   ->       6
Number of variables                   4   ->       3
j variable (3 values)                     ->   _j
xij variables:
                               B1 B2 B3   ->   B
-----------------------------------------------------------------------------

. rename B person

. drop if person == .
(3 observations deleted)

. drop _j

. gen treated = 1

. list

     +---------------------------+
     | family   person   treated |
     |---------------------------|
  1. |  alpha      102         1 |
  2. |  alpha      104         1 |
  3. |   beta      203         1 |
     +---------------------------+

. append using Afile

. collapse (max) treated, by(family person)

. list

     +---------------------------+
     | family   person   treated |
     |---------------------------|
  1. |  alpha      101         0 |
  2. |  alpha      102         1 |
  3. |  alpha      103         0 |
  4. |  alpha      104         1 |
  5. |   beta      201         0 |
     |---------------------------|
  6. |   beta      202         0 |
  7. |   beta      203         1 |
  8. |   beta      204         0 |
     +---------------------------+

Here's the code in case it's useful.

list
keep family A*
reshape long A, i(family)
drop if _j == .
drop _j
gen treated = 0
rename A person
save Afile, replace
use rowprob
keep family B*
reshape long B, i(family)
rename B person
drop if person == .
drop _j
gen treated = 1
list
append using Afile
collapse (max) treated, by(family person)
list

I think this is messier than a single -reshape- because you have
different numbers of A and B variables and they don't map on to one
another. There would be a -merge- solution as well, for sure.

On Tue, Jun 26, 2012 at 10:37 PM, samuel gyetvay <sam.gyetvay@gmail.com> wrote:
> I have two sets of variables, let's call them A1, A2, ... A19 and B1,
> B2, ... B8.
>
> A1, A2, ... A19 give identification numbers for up to 19 individuals
> per family. Each family occupies a row in the data set.
>
> B1, B2, ... B8 list identification numbers of up to 8 individuals who
> have received treatment.
>
> I need to preserve the order and placement of variables in A1, ... A19
> and would like to create a dummy variable equal to 1 whenever an
> individual has received treatment. Basically, I need to go from
> something that like this:
>
> A1  A2  A3 ...  A19
> 101 102 103 ... 19
>
> B1 B2 B3 ... B8
> 103  .    .    ...  .
>
> To something like this
>
> A1  A2  A3 ...   A19
> 101 102 103 ... 119
>
> D1 D2 D3 ... D19
> 0   0    1   ... 0
>
> I am aware of the command rowsort, but it does not solve this
> particular problem. rowsort would turn
>
> B1 B2 B3 ... B8
> .      .  102 ...  .
>
>  into
>
> B1 B2 B3 ... B8
> 102 .   .    ...  .
>
> when what I need is
>
> B1 B2 B3 ... B8
> .    102  .  ...  .
>
> I could create a dummy variable equal to 1 when A is equal to B
>
>
> Hopefully this question is clearly phrased, and there exists a simple
> solution. Please let me know if you have any suggestions or if
> anything is unclear.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index