[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: RE: Data manipulation query
Suppose you focus on English.
You can go
egen nEnglish = sum(strpos(subject, "English")), by(ID)
keep if nEnglish == 2
Or suppose you focus on Mathematics.
You can go
egen nMath = sum(strpos(subject,"Math")), by(ID)
keep if nMath == 2
bysort ID (Math) : gen max = score
Anirudh V. S. Ruhil, Ph.D. wrote
> I have two data manipulation questions:
> (a) Lets says I have a dataset of students' scores for 2
> subjects when
> tested in the 4th grade, and their scores on similar tests
> taken in the 6th
> grade. Some students are lost between the 4th and 6th grade
> but some show
> up in both grades. The data structure is as follows ...
> ID grade subject score
> A1 04 English 271
> A1 04 Mathematics 190
> A1 06 English 260
> A1 06 Mathematics 214
> A2 04 English .
> A2 04 Mathematics 165
> A2 06 English 187
> A2 06 Mathematics 193
> A3 04 English .
> A3 04 Mathematics .
> A3 06 English 216
> A3 06 Mathematics 265
> How would I create a subset of these data such that the subset only
> contains records for students with non-missing scores in a
> given subject on
> BOTH the 4th and the 6th grade tests?
> (b) In the same dataset, let us assume there are multiple
> records for some
> students on a single grade and subject. For example,
> A4 04 English 191
> A4 04 English 219
> Of these multiple records, how can I select entries with the
> HIGHER SCORE?
> (i.e., the lower scores have to be discarded).
> I'm sure there is a quickfire way to solve both so I'd be grateful if
> someone points me in the right direction in STATAspeak (9.1).
* For searches and help try: