Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: data management question


From   Ulrich Kohler <kohler@wz-berlin.de>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: data management question
Date   Thu, 15 May 2003 08:52:37 +0200

I think the following should do the trick:

In the second data set:

. ren marker m
. reshape long group_, i(m) j(index)
. ren group_  marker
. reshape wide marker, j(m) i(index)
. ren index group
. sort group
. save 11, replace

An than, in the first data:

. sort group
. merge group using 11

This should work for your example data but there might be problems in the 
entire data which i have overlooked.

regards
uli 

David Airey wrote:
> Dear List,
>
> I have two data sets that I want to put together to analyze with a
> nested model.
>
> The first data set looks like:
>
> group        var
> 1            2
> 1            3
> 1            2
> 2            5
> 2            4
> 3            3
> 3            4
> 3            5
> 3            5
>
> Thus there will be roughly balanced data by group. Group in this case
> is really different inbred strains of mice. Each row above is an
> animal. Thus there are 3 animals for group 1, 2 for group 2, and 4 for
> group 3. Var will be a continuously distributed dependent variable.
>
> The other data set looks like:
>
> marker  group_1  group_2  group_3
> 1       aa       aa       bb
> 2       aa       aa       bb
> 3       aa       bb       bb
>
> In this data set, each row is a genetic marker. The second to fourth
> columns are genetic information for each marker for each group.
>
> I want the get the data together such that it looks like:
>
> group    var   marker_1    marker_2    marker_3
> 1        2     aa          aa          aa
> 1        3     aa          aa          aa
> 1        2     aa          aa          aa
> 2        5     aa          aa          bb
> 2        4     aa          aa          bb
> 3        3     bb          bb          bb
> 3        4     bb          bb          bb
> 3        5     bb          bb          bb
> 3        5     bb          bb          bb
>
> Notice that each animal in a group has the same genetic information for
> any particular marker, but var may differ between animals. The basic
> model to analyze these data by marker will nest animal in marker to
> predict var.
>
> My problem is how to write a program that is smart enough to properly
> repeat the genetic information by group in bringing the two files
> together. The number of animals per group may change from file to file
> and the number of markers may change also, but each group will have
> genetic information at each marker (no missing genetic information).
>
> Thanks much for any help or example code on similar problems.
>
> Sincerely,
>
> Dave
>
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

-- 
kohler@wz-berlin.de


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index