Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Re: RE: Data manipulation query


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: Re: RE: Data manipulation query
Date   Thu, 12 Jan 2006 00:46:34 -0000

Correct on my assumption. I was focusing on the 
word HIGHER. My mistake. 

New suggestion for my (b) 

(b)
Given (a),
egen maxMath = max(score), by(ID) 

I think that takes care of missings too. 

Nick 
n.j.cox@durham.ac.uk 

> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu
> [mailto:owner-statalist@hsphsun2.harvard.edu]On Behalf Of Michael
> Blasnik
> Sent: 12 January 2006 00:36
> To: statalist@hsphsun2.harvard.edu
> Subject: st: Re: RE: Data manipulation query
> 
> 
> I believe Nick's solution relies on there being only 2 
> possible scores per 
> subject, but the second part of your query indicates that 
> there may be 
> multiple scores for a given year.  Here's another approach:
> 
> starting with the 2nd question:
> * first, drop missing scores if more than 1 score
> bysort ID subject grade (score): drop if score==. & score[1]<.
> * then drop lower scores when there's more than 1
> bysort ID subject grade (score): drop if _n<_N
> 
> Now, assuming that grade only takes on the values of 4 and 6 
> and you've 
> eliminated duplicates within ID subject grade (using the code 
> above), the 
> following will flag cases where there are grades for both 
> cases within each 
> ID  and subject, and then you can drop the observations that 
> don't meet that 
> criterion:
> 
> bysort ID subject (grade): gen byte hasboth=score[1]<. & score[2]<.
> drop if hasboth==0
> 
> Michael Blasnik
> michael.blasnik@verizon.net
> 
> 
> ----- Original Message ----- 
> From: "Nick Cox" <n.j.cox@durham.ac.uk>
> To: <statalist@hsphsun2.harvard.edu>
> Sent: Wednesday, January 11, 2006 6:58 PM
> Subject: st: RE: Data manipulation query
> 
> 
> > (a)
> >
> > Suppose you focus on English.
> >
> > You can go
> >
> > egen nEnglish = sum(strpos(subject, "English")), by(ID)
> > keep if nEnglish == 2
> > drop nEnglish
> >
> ><snip>
> > (b)
> > Given (a),
> > bysort ID (Math) : gen max = score[2]
> > etc.
> >
> > Nick
> > n.j.cox@durham.ac.uk
> >
> > Anirudh V. S. Ruhil, Ph.D. wrote
> >
> >> I have two data manipulation questions:
> >>
> >> (a) Lets says I have a dataset of students' scores for 2
> >> subjects when
> >> tested in the 4th grade, and their scores on similar tests
> >> taken in the 6th
> >> grade. Some students are lost between the 4th and 6th grade
> >> but some show
> >> up in both grades. The data structure is as follows ...
> >>
> >> ID grade subject score
> >> A1 04 English 271
> >> A1 04 Mathematics 190
> >> A1 06 English 260
> >> A1 06 Mathematics 214
> >> A2 04 English .
> >> A2 04 Mathematics 165
> >> A2 06 English 187
> >> A2 06 Mathematics 193
> >> A3 04 English .
> >> A3 04 Mathematics .
> >> A3 06 English 216
> >> A3 06 Mathematics 265
> >>
> >> How would I create a subset of these data such that the subset only
> >> contains records for students with non-missing scores in a
> >> given subject on
> >> BOTH the 4th and the 6th grade tests?
> >>
> >> (b) In the same dataset, let us assume there are multiple
> >> records for some
> >> students on a single grade and subject. For example,
> >>
> >> A4 04 English 191
> >> A4 04 English 219
> >>
> >> Of these multiple records, how can I select entries with the
> >> HIGHER SCORE?
> >> (i.e., the lower scores have to be discarded).
> >>
> >> I'm sure there is a quickfire way to solve both so I'd be 
> grateful if
> >> someone points me in the right direction in STATAspeak (9.1).
> 

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2021 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index