[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: Simple tab needed but multiple records + how do peoplelearn Stata?

From   Joseph Wagner <>
Subject   Re: st: RE: Simple tab needed but multiple records + how do peoplelearn Stata?
Date   Mon, 08 Oct 2007 09:41:41 -0400

Thanks Kit that worked though I am embarrassed that idea didn't come to me. Thanks also to Maarten and Carlo for your suggestions as well. As Nick pointed out, the answer was in my mailbox earlier that day with Kit's answer to Paul O'Brien's post though I had read the post and forgotten about it by the time my little problem popped up. I should have been smarter with my search.

I used to keep my own library of useful code with snippets of code prefaced by an explanation lifted either from statalist or websites but I have gotten out of that habit over the past few years as I transitioned to different kinds of work and using Stata less.

Nick Cox wrote:

Maarten is right. If data are like this

id female likes_cats likes_dogs -- ------ ---------- ----------
1 0 0 0
2 1 1 0 3 0 1 0 ...

in which each person is represented by only one
observation (record), then it's easy to count how many people satisfy two (or indeed more)
different conditions.
e.g. -count if female & likes_cats & likes_dogs-

Nor are indicators (dummy, logical, Boolean variables) essential as we can always use explicit true or false conditions instead.
This kind of structure is I think also assumed by Carlo Lazzaro in his posting in this thread.
This is not the structure Joseph has and it would be unnatural to force
his dataset into a different structure
given the irregularity of dates that he presumably has.
Hence Kit's proposal is closer to, indeed
on, the mark.
What's more, this is essentially the same problem as that posted by Paul O'Brien
just the same day and already replied to
with code

The class of problem is this:
1. There is some kind of grouping, most obviously into panel or longitudinal data. For concreteness, we'll talk "panels" and remember that the idea is more general. (Indeed, no kind of time basis, regular or irregular, is essential here.)
2. Hence, multiple observations for each panel
are likely.
3. Some question arises about panels that requires comparison of different observations.
4. For each observation, we can say whether it satisfies some condition. That is a true-or-false
5. We need to summarise that true-or-false result
over all observations in each panel. This can be done with -egen, by(<panelid>)- or -by <panelid>: egen- or -by <panelid>: gen-.
6. Then we need to combine information on different
conditions using logical operators such as &, | and !.
7. Finally, we must count panels, not individuals.
Kit Baum ---------------------------------------------
I think this should work, without the necessity of reshaping:

bysort id: gen early = inrange(age, 17, 25)
by id: gen late = age > 30
by id: gen both = cond(_n==_N, (sum(early) & sum(late)) , .)
count if both == 1

To test,

set obs 1000
g id=mod(_n,100)+1
g age=40*uniform()

Maarten Buis
This kind of problem usually becomes a lot easier when you first use
-reshape- to put the data into wide format.

Joseph Wagner

I have a dataset of x-ray records with multiple records per patient. The records consist of id, age, and sex and I need to know how many persons had an x-ray when they were between the age of 17 and 25 AND when when they were over 30.
* For searches and help try:
Joseph H. Wagner, M.P.H.
Lifespan Health Research Center
Wright State University Boonshoft School of Medicine
3171 Research Blvd.
Kettering, OH 45420-4014

(937) 775-1494 (LHRC office)
(937) 775-1456 (fax)

Visit the Lifespan Health Research Center Home Page at:
* For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index