(a)
Suppose you focus on English.
You can go
egen nEnglish = sum(strpos(subject, "English")), by(ID)
keep if nEnglish == 2
drop nEnglish
Or suppose you focus on Mathematics.
You can go
egen nMath = sum(strpos(subject,"Math")), by(ID)
keep if nMath == 2
drop nMath
(b)
Given (a),
bysort ID (Math) : gen max = score[2]
etc.
Nick
[email protected]
Anirudh V. S. Ruhil, Ph.D. wrote
> I have two data manipulation questions:
>
> (a) Lets says I have a dataset of students' scores for 2
> subjects when
> tested in the 4th grade, and their scores on similar tests
> taken in the 6th
> grade. Some students are lost between the 4th and 6th grade
> but some show
> up in both grades. The data structure is as follows ...
>
> ID grade subject score
> A1 04 English 271
> A1 04 Mathematics 190
> A1 06 English 260
> A1 06 Mathematics 214
> A2 04 English .
> A2 04 Mathematics 165
> A2 06 English 187
> A2 06 Mathematics 193
> A3 04 English .
> A3 04 Mathematics .
> A3 06 English 216
> A3 06 Mathematics 265
>
> How would I create a subset of these data such that the subset only
> contains records for students with non-missing scores in a
> given subject on
> BOTH the 4th and the 6th grade tests?
>
> (b) In the same dataset, let us assume there are multiple
> records for some
> students on a single grade and subject. For example,
>
> A4 04 English 191
> A4 04 English 219
>
> Of these multiple records, how can I select entries with the
> HIGHER SCORE?
> (i.e., the lower scores have to be discarded).
>
> I'm sure there is a quickfire way to solve both so I'd be grateful if
> someone points me in the right direction in STATAspeak (9.1).
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/