Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Collapsing and Reshaping
From 
 
Eric Booth <[email protected]> 
To 
 
"<[email protected]>" <[email protected]> 
Subject 
 
Re: st: Collapsing and Reshaping 
Date 
 
Tue, 30 Nov 2010 04:34:04 +0000 
<>
It's not clear why you are missing year information -- you've got scores from those years, so you could replace year if the score value is not missing.  You do need a year value to collapse and/or reshape and, no, filling it with a "numeric dummy" to complete command is generally not a good idea (how would you differentiate the records across id_codes after the collapse/reshape ?).
In the code below, I am making the assumption that there are non-missing values (test scores) for math* and eng* only in the years that correspond to the "year" variable.  If this is not how your data are set up, it's probably a good idea to clarify with a snippet of your data (or a fake equivalent).  You only need to collapse if you've got multiple tests for the same year (e.g., in my example data below, id_code 113 takes the tests in 2008 twice). ((As a side note, you don't have to collapse this information, you could create a math_1_08 and math_2_08, etc  for the max number of tests any student takes in a year -- the "num_test_taken" variable below counts these for you)).    Then you can create an id for each observation for use in the reshape command (which will get rid of your "id_code does not uniquely identify the observations" error message).
***************!
**Watch for wrapping issues in the code below.
clear
inp id_code year math06 eng06 math07 eng07 math08 eng08
112  2006  7  3 . . . . . 
112  2007 . .  8  6 . . 
112  . . . . . 4  2 
112  . . . . . .  
113  2006 3  2 . . . .  
113  2007 . . 9  3 . . 
113  . . . . . 2 2  
113  2008 . . . . 2 8 . 
114  . 1 4  . . .  .
114  2007 . . 7  0 . .  
114  2008 . . . .  6  6 
end
//1.  collapse//
**note: obs 113 took 2 tests in 2008
**first, fillin year based on presence of math/eng scores
forval n = 6/8 {
	foreach v in math0 eng0 {
	replace year = 200`n' if !mi(`v'`n') & mi(year)
	}
}
**how many tests did that student take each year?
bys year id_code: g num_tests_taken = _N
li
**next, collapse
ds id_code year, not
collapse (mean) `r(varlist)', by(id_code year)
li
drop year   //year isn't necessary
//2.  reshape//
g id = _n
reshape long math0 eng0, i(id) j(yr) 
**cleanup**
replace yr = 2000+yr
drop if mi(math0, eng0)
rename math0 math
rename eng0 eng
drop id
li
***************!
- Eric
__
Eric A. Booth
Public Policy Research Institute
Texas A&M University
[email protected]
Office: +979.845.6754
Fax: +979.845.0249
http://ppri.tamu.edu
On Nov 29, 2010, at 9:53 PM, Katie and Matt O'Varanese wrote:
> I am trying to reshape some data long... right now the data looks like
> this for example:
> 
> id_code     year       math06   math07  math08    eng06    eng07    eng08
> 
> 112           2006
> 112           2007
> 112              .
> 112              .
> 113           2006
> 113           2007
> 113              .
> 113           2008
> 114              .
> 114           2007
> 114           2008
> 
> When I try:  reshape long [vars@x], i(id_code) j(year)
> i get an error message "i=id_code does not uniquely identify the observations;
> there are multiple observations with the same value of id_code."
> Do I need to collapse first?  If so, when I try to collapse by
> (id_code year), I get an error message saying that the year variable
> has missing values.
> 
> Do I need to change the missing values to a numeric dummy just to
> complete the command?  or is there a better way to do this?  Ultimate
> goal: I want to have unique id_code and then a year variable with each
> of the three years represented (06, 07, 08).
> 
> 
> Help please!  Thanks as always!!
> 
> Kate
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/