Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: 'sophisticated' subscripting

From   n j cox <>
Subject   Re: st: RE: 'sophisticated' subscripting
Date   Mon, 28 May 2007 16:44:11 +0100

Here is another way to do it. In some ways, it is worse
technique, but in other ways it shows some of the power of Stata.

bysort city : egen pop1700 = total(cond(year == 1700, pop, .))

What is going on here?

1. bysort city :

Stata must work within panels defined by -city-. We have to -sort- if we have not already sorted by -city-. With panel data, we have probably done that already, say as a side-effect of -tsset-, but it does no harm to specify the -sort-. You can say

bys city :


by city, sort:


sort city
by city:

I like what I wrote first, but it's a matter of taste only.

2. cond(year == 1700, pop, .)

If the year is 1700, I want to use the value of -pop-; otherwise, forget it.

3. egen pop1700 = total( )

-egen- should add up the results of the expression I just used -- within
panel, as explained in #1. I am assuming that 1700 occurs at most once within each panel. If there is no observation for 1700 in a panel. the result is missing, as it should be. If thre is an observation for 1700, then only the value for that will be used in the total, as it should it be. Missings will be ignored in the total for a panel, unless, as just mentioned, all values are missing, in which case the result will have to be missing.

Note that this is _not_ equivalent to

bysort city : egen pop1700 = total(pop) if year == 1700

as that leaves missings almost everywhere, and is absolutely
no gain over

gen pop1700 = pop if year == 1700

In fact, it is much less efficient.

In various versions before Stata 9, -egen, total()- was called
-egen, sum()-.


Nick Cox

Precisely this problem was discussed just a few days ago. See
this post from 18 May:

Here it is again:

gen pop1700 = pop if year == 1700
bysort city (pop1700) : replace pop1700 = pop1700[1]

Davide Cantoni

> I have an (unbalanced) panel of cities, with their respective
> populations over a series of years. Now I want to generate a new
> variable that gives me, for each city, the population of that city in
> 1700. Due to the unbalanced nature of the panel, it is NOT the case
> that 1700 is the first (or nth) observation within group, so that
> subscripting within groups is not going to help me in this case. Any
> suggestions?

* For searches and help try:

© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index