(a) Suppose you focus on English. You can go egen nEnglish = sum(strpos(subject, "English")), by(ID) keep if nEnglish == 2 drop nEnglish Or suppose you focus on Mathematics. You can go egen nMath = sum(strpos(subject,"Math")), by(ID) keep if nMath == 2 drop nMath (b) Given (a), bysort ID (Math) : gen max = score[2] etc. Nick [email protected] Anirudh V. S. Ruhil, Ph.D. wrote > I have two data manipulation questions: > > (a) Lets says I have a dataset of students' scores for 2 > subjects when > tested in the 4th grade, and their scores on similar tests > taken in the 6th > grade. Some students are lost between the 4th and 6th grade > but some show > up in both grades. The data structure is as follows ... > > ID grade subject score > A1 04 English 271 > A1 04 Mathematics 190 > A1 06 English 260 > A1 06 Mathematics 214 > A2 04 English . > A2 04 Mathematics 165 > A2 06 English 187 > A2 06 Mathematics 193 > A3 04 English . > A3 04 Mathematics . > A3 06 English 216 > A3 06 Mathematics 265 > > How would I create a subset of these data such that the subset only > contains records for students with non-missing scores in a > given subject on > BOTH the 4th and the 6th grade tests? > > (b) In the same dataset, let us assume there are multiple > records for some > students on a single grade and subject. For example, > > A4 04 English 191 > A4 04 English 219 > > Of these multiple records, how can I select entries with the > HIGHER SCORE? > (i.e., the lower scores have to be discarded). > > I'm sure there is a quickfire way to solve both so I'd be grateful if > someone points me in the right direction in STATAspeak (9.1). > * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

