[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: data management question

From	Ulrich Kohler <[email protected]>
To	[email protected]
Subject	Re: st: data management question
Date	Thu, 15 May 2003 08:52:37 +0200

I think the following should do the trick:

In the second data set:

. ren marker m
. reshape long group_, i(m) j(index)
. ren group_  marker
. reshape wide marker, j(m) i(index)
. ren index group
. sort group
. save 11, replace

An than, in the first data:

. sort group
. merge group using 11

This should work for your example data but there might be problems in the 
entire data which i have overlooked.

regards
uli 

David Airey wrote:
> Dear List,
>
> I have two data sets that I want to put together to analyze with a
> nested model.
>
> The first data set looks like:
>
> group        var
> 1            2
> 1            3
> 1            2
> 2            5
> 2            4
> 3            3
> 3            4
> 3            5
> 3            5
>
> Thus there will be roughly balanced data by group. Group in this case
> is really different inbred strains of mice. Each row above is an
> animal. Thus there are 3 animals for group 1, 2 for group 2, and 4 for
> group 3. Var will be a continuously distributed dependent variable.
>
> The other data set looks like:
>
> marker  group_1  group_2  group_3
> 1       aa       aa       bb
> 2       aa       aa       bb
> 3       aa       bb       bb
>
> In this data set, each row is a genetic marker. The second to fourth
> columns are genetic information for each marker for each group.
>
> I want the get the data together such that it looks like:
>
> group    var   marker_1    marker_2    marker_3
> 1        2     aa          aa          aa
> 1        3     aa          aa          aa
> 1        2     aa          aa          aa
> 2        5     aa          aa          bb
> 2        4     aa          aa          bb
> 3        3     bb          bb          bb
> 3        4     bb          bb          bb
> 3        5     bb          bb          bb
> 3        5     bb          bb          bb
>
> Notice that each animal in a group has the same genetic information for
> any particular marker, but var may differ between animals. The basic
> model to analyze these data by marker will nest animal in marker to
> predict var.
>
> My problem is how to write a program that is smart enough to properly
> repeat the genetic information by group in bringing the two files
> together. The number of animals per group may change from file to file
> and the number of markers may change also, but each group will have
> genetic information at each marker (no missing genetic information).
>
> Thanks much for any help or example code on similar problems.
>
> Sincerely,
>
> Dave
>
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

-- 
[email protected]


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- st: adjusting data for nlogit
  - From: <[email protected]>

References:
- st: data management question
  - From: David Airey <[email protected]>

Prev by Date: Re: st: setting a counter
Next by Date: st: smart way to get a table of frequencies over several variables?
Previous by thread: st: data management question
Next by thread: st: adjusting data for nlogit
Index(es):
- Date
- Thread