Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Data manipulation query

From   "Anirudh V. S. Ruhil" <>
Subject   st: Data manipulation query
Date   Wed, 11 Jan 2006 18:25:18 -0500

I have two data manipulation questions:

(a) Lets says I have a dataset of students' scores for 2 subjects when tested in the 4th grade, and their scores on similar tests taken in the 6th grade. Some students are lost between the 4th and 6th grade but some show up in both grades. The data structure is as follows ...

ID grade subject score
A1 04 English 271
A1 04 Mathematics 190
A1 06 English 260
A1 06 Mathematics 214
A2 04 English .
A2 04 Mathematics 165
A2 06 English 187
A2 06 Mathematics 193
A3 04 English .
A3 04 Mathematics .
A3 06 English 216
A3 06 Mathematics 265

How would I create a subset of these data such that the subset only contains records for students with non-missing scores in a given subject on BOTH the 4th and the 6th grade tests?

(b) In the same dataset, let us assume there are multiple records for some students on a single grade and subject. For example,

A4 04 English 191
A4 04 English 219

Of these multiple records, how can I select entries with the HIGHER SCORE? (i.e., the lower scores have to be discarded).

I'm sure there is a quickfire way to solve both so I'd be grateful if someone points me in the right direction in STATAspeak (9.1).

By the way, the N is about 3 million, though the number of unique student IDs is about 250,000.

thanks in advance


Anirudh V. S. Ruhil, Ph.D.
Sr. Research Associate
Voinovich Center for Leadership and Public Affairs
Ohio University
Building 21, The Ridges
Athens, OH 45701-2979
Tel: 740.597.1949 | Fax: 740.597.3057
* For searches and help try:

© Copyright 1996–2021 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index