Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Conditional Variable means to new observation

From   Nickolas Lyell <>
To   "" <>
Subject   RE: st: Conditional Variable means to new observation
Date   Wed, 4 Sep 2013 15:44:42 -0400


I am still stuck with this problem.  I would like to make a new observation that encompasses Large counties.  I already have a dummy for large counties and would just like to sum all of each year's observations where Large County is equal to 1.

I want to do this so I can analyze all large counties the same way I am looking at each observation.  I don't plan on doing any regressions on this dataset, merely creating new variables to help me understand the data and I am using Stata because the programmatic approach of a do file makes sense to me when dealing with these very large files.

Could anyone please help.

Nicholas Lyell

-----Original Message-----
From: Nickolas Lyell 
Sent: Friday, August 30, 2013 9:28 AM
To: ''
Subject: RE: st: Conditional Variable means to new observation

I see, thank you.

Nicholas Lyell
Research Associate
National Association of Counties | NACo | 202.661.8820

-----Original Message-----
From: [] On Behalf Of Maarten Buis
Sent: Friday, August 30, 2013 9:25 AM
Subject: Re: st: Conditional Variable means to new observation

On Fri, Aug 30, 2013 at 3:05 PM, Nickolas Lyell wrote:
> I am looking to take a conditional mean (or sum) of a variable and include it as a new observation.
> For instance, I have data with several county indicators horizontally and county ids vertically.  I would like to take the mean growth rate (a variable) for only those counties that are Large (LgMdSm==2) and create a new observation that contains that value under the variable growth rate.

You almost never want to store those numbers as an extra row in your data. Stata takes the definition of a dataset very strictly, and rightly so: the rows are the units and the columns are characteristics of those units. All large counties together does not represent a new unit. However, that mean growth rate you want to compute is a characteristic shared by all counties that are "large": so that mean has to be stored as a column. Here are two ways of computing such

*------------------ begin example ------------------ // create some example data clear set obs 10 gen county_id = _n gen LgMdSm = (_n > 5) + 1 gen growth = rnormal()

// first method
egen mean_growth = mean(growth) if LgMdSm == 2

// second method
bys LgMdSm : egen mean_growth2 = mean(growth)

// see the results
*------------------- end example -------------------
* (For more on examples I sent to the Statalist see:
* )

Hope this helps,

Maarten L. Buis
Reichpietschufer 50
10785 Berlin

*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index