[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: RE: Re: RE: Data manipulation query
Correct on my assumption. I was focusing on the
word HIGHER. My mistake.
New suggestion for my (b)
egen maxMath = max(score), by(ID)
I think that takes care of missings too.
> -----Original Message-----
> From: email@example.com
> [mailto:firstname.lastname@example.org]On Behalf Of Michael
> Sent: 12 January 2006 00:36
> To: email@example.com
> Subject: st: Re: RE: Data manipulation query
> I believe Nick's solution relies on there being only 2
> possible scores per
> subject, but the second part of your query indicates that
> there may be
> multiple scores for a given year. Here's another approach:
> starting with the 2nd question:
> * first, drop missing scores if more than 1 score
> bysort ID subject grade (score): drop if score==. & score<.
> * then drop lower scores when there's more than 1
> bysort ID subject grade (score): drop if _n<_N
> Now, assuming that grade only takes on the values of 4 and 6
> and you've
> eliminated duplicates within ID subject grade (using the code
> above), the
> following will flag cases where there are grades for both
> cases within each
> ID and subject, and then you can drop the observations that
> don't meet that
> bysort ID subject (grade): gen byte hasboth=score<. & score<.
> drop if hasboth==0
> Michael Blasnik
> ----- Original Message -----
> From: "Nick Cox" <firstname.lastname@example.org>
> To: <email@example.com>
> Sent: Wednesday, January 11, 2006 6:58 PM
> Subject: st: RE: Data manipulation query
> > (a)
> > Suppose you focus on English.
> > You can go
> > egen nEnglish = sum(strpos(subject, "English")), by(ID)
> > keep if nEnglish == 2
> > drop nEnglish
> > (b)
> > Given (a),
> > bysort ID (Math) : gen max = score
> > etc.
> > Nick
> > firstname.lastname@example.org
> > Anirudh V. S. Ruhil, Ph.D. wrote
> >> I have two data manipulation questions:
> >> (a) Lets says I have a dataset of students' scores for 2
> >> subjects when
> >> tested in the 4th grade, and their scores on similar tests
> >> taken in the 6th
> >> grade. Some students are lost between the 4th and 6th grade
> >> but some show
> >> up in both grades. The data structure is as follows ...
> >> ID grade subject score
> >> A1 04 English 271
> >> A1 04 Mathematics 190
> >> A1 06 English 260
> >> A1 06 Mathematics 214
> >> A2 04 English .
> >> A2 04 Mathematics 165
> >> A2 06 English 187
> >> A2 06 Mathematics 193
> >> A3 04 English .
> >> A3 04 Mathematics .
> >> A3 06 English 216
> >> A3 06 Mathematics 265
> >> How would I create a subset of these data such that the subset only
> >> contains records for students with non-missing scores in a
> >> given subject on
> >> BOTH the 4th and the 6th grade tests?
> >> (b) In the same dataset, let us assume there are multiple
> >> records for some
> >> students on a single grade and subject. For example,
> >> A4 04 English 191
> >> A4 04 English 219
> >> Of these multiple records, how can I select entries with the
> >> HIGHER SCORE?
> >> (i.e., the lower scores have to be discarded).
> >> I'm sure there is a quickfire way to solve both so I'd be
> grateful if
> >> someone points me in the right direction in STATAspeak (9.1).
* For searches and help try: