Eric Booth <ebooth@ppri.tamu.edu>

<statalist@hsphsun2.harvard.edu>

Re: st: Collapsing and Reshaping

Tue, 30 Nov 2010 04:34:04 +0000

<> It's not clear why you are missing year information -- you've got scores from those years, so you could replace year if the score value is not missing. You do need a year value to collapse and/or reshape and, no, filling it with a "numeric dummy" to complete command is generally not a good idea (how would you differentiate the records across id_codes after the collapse/reshape ?). In the code below, I am making the assumption that there are non-missing values (test scores) for math* and eng* only in the years that correspond to the "year" variable. If this is not how your data are set up, it's probably a good idea to clarify with a snippet of your data (or a fake equivalent). You only need to collapse if you've got multiple tests for the same year (e.g., in my example data below, id_code 113 takes the tests in 2008 twice). ((As a side note, you don't have to collapse this information, you could create a math_1_08 and math_2_08, etc for the max number of tests any student takes in a year -- the "num_test_taken" variable below counts these for you)). Then you can create an id for each observation for use in the reshape command (which will get rid of your "id_code does not uniquely identify the observations" error message). ***************! **Watch for wrapping issues in the code below. clear inp id_code year math06 eng06 math07 eng07 math08 eng08 112 2006 7 3 . . . . . 112 2007 . . 8 6 . . 112 . . . . . 4 2 112 . . . . . . 113 2006 3 2 . . . . 113 2007 . . 9 3 . . 113 . . . . . 2 2 113 2008 . . . . 2 8 . 114 . 1 4 . . . . 114 2007 . . 7 0 . . 114 2008 . . . . 6 6 end //1. collapse// **note: obs 113 took 2 tests in 2008 **first, fillin year based on presence of math/eng scores forval n = 6/8 { foreach v in math0 eng0 { replace year = 200`n' if !mi(`v'`n') & mi(year) } } **how many tests did that student take each year? bys year id_code: g num_tests_taken = _N li **next, collapse ds id_code year, not collapse (mean) `r(varlist)', by(id_code year) li drop year //year isn't necessary //2. reshape// g id = _n reshape long math0 eng0, i(id) j(yr) **cleanup** replace yr = 2000+yr drop if mi(math0, eng0) rename math0 math rename eng0 eng drop id li ***************! - Eric __ Eric A. Booth Public Policy Research Institute Texas A&M University ebooth@ppri.tamu.edu Office: +979.845.6754 Fax: +979.845.0249 http://ppri.tamu.edu On Nov 29, 2010, at 9:53 PM, Katie and Matt O'Varanese wrote: > I am trying to reshape some data long... right now the data looks like > this for example: > > id_code year math06 math07 math08 eng06 eng07 eng08 > > 112 2006 > 112 2007 > 112 . > 112 . > 113 2006 > 113 2007 > 113 . > 113 2008 > 114 . > 114 2007 > 114 2008 > > When I try: reshape long [vars@x], i(id_code) j(year) > i get an error message "i=id_code does not uniquely identify the observations; > there are multiple observations with the same value of id_code." > Do I need to collapse first? If so, when I try to collapse by > (id_code year), I get an error message saying that the year variable > has missing values. > > Do I need to change the missing values to a numeric dummy just to > complete the command? or is there a better way to do this? Ultimate > goal: I want to have unique id_code and then a year variable with each > of the three years represented (06, 07, 08). > > > Help please! Thanks as always!! > > Kate

