# st: RE: Re: RE: Data manipulation query

 From "Nick Cox" To Subject st: RE: Re: RE: Data manipulation query Date Thu, 12 Jan 2006 00:46:34 -0000

```Correct on my assumption. I was focusing on the
word HIGHER. My mistake.

New suggestion for my (b)

(b)
Given (a),
egen maxMath = max(score), by(ID)

I think that takes care of missings too.

Nick
n.j.cox@durham.ac.uk

> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu
> [mailto:owner-statalist@hsphsun2.harvard.edu]On Behalf Of Michael
> Blasnik
> Sent: 12 January 2006 00:36
> To: statalist@hsphsun2.harvard.edu
> Subject: st: Re: RE: Data manipulation query
>
>
> I believe Nick's solution relies on there being only 2
> possible scores per
> subject, but the second part of your query indicates that
> there may be
> multiple scores for a given year.  Here's another approach:
>
> starting with the 2nd question:
> * first, drop missing scores if more than 1 score
> bysort ID subject grade (score): drop if score==. & score[1]<.
> * then drop lower scores when there's more than 1
> bysort ID subject grade (score): drop if _n<_N
>
> Now, assuming that grade only takes on the values of 4 and 6
> and you've
> eliminated duplicates within ID subject grade (using the code
> above), the
> following will flag cases where there are grades for both
> cases within each
> ID  and subject, and then you can drop the observations that
> don't meet that
> criterion:
>
> bysort ID subject (grade): gen byte hasboth=score[1]<. & score[2]<.
> drop if hasboth==0
>
> Michael Blasnik
> michael.blasnik@verizon.net
>
>
> ----- Original Message -----
> From: "Nick Cox" <n.j.cox@durham.ac.uk>
> To: <statalist@hsphsun2.harvard.edu>
> Sent: Wednesday, January 11, 2006 6:58 PM
> Subject: st: RE: Data manipulation query
>
>
> > (a)
> >
> > Suppose you focus on English.
> >
> > You can go
> >
> > egen nEnglish = sum(strpos(subject, "English")), by(ID)
> > keep if nEnglish == 2
> > drop nEnglish
> >
> ><snip>
> > (b)
> > Given (a),
> > bysort ID (Math) : gen max = score[2]
> > etc.
> >
> > Nick
> > n.j.cox@durham.ac.uk
> >
> > Anirudh V. S. Ruhil, Ph.D. wrote
> >
> >> I have two data manipulation questions:
> >>
> >> (a) Lets says I have a dataset of students' scores for 2
> >> subjects when
> >> tested in the 4th grade, and their scores on similar tests
> >> taken in the 6th
> >> grade. Some students are lost between the 4th and 6th grade
> >> but some show
> >> up in both grades. The data structure is as follows ...
> >>
> >> ID grade subject score
> >> A1 04 English 271
> >> A1 04 Mathematics 190
> >> A1 06 English 260
> >> A1 06 Mathematics 214
> >> A2 04 English .
> >> A2 04 Mathematics 165
> >> A2 06 English 187
> >> A2 06 Mathematics 193
> >> A3 04 English .
> >> A3 04 Mathematics .
> >> A3 06 English 216
> >> A3 06 Mathematics 265
> >>
> >> How would I create a subset of these data such that the subset only
> >> contains records for students with non-missing scores in a
> >> given subject on
> >> BOTH the 4th and the 6th grade tests?
> >>
> >> (b) In the same dataset, let us assume there are multiple
> >> records for some
> >> students on a single grade and subject. For example,
> >>
> >> A4 04 English 191
> >> A4 04 English 219
> >>
> >> Of these multiple records, how can I select entries with the
> >> HIGHER SCORE?
> >> (i.e., the lower scores have to be discarded).
> >>
> >> I'm sure there is a quickfire way to solve both so I'd be
> grateful if
> >> someone points me in the right direction in STATAspeak (9.1).
>

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```