Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: duplicating values within one variable

From   "Nick Cox" <>
To   <>
Subject   st: RE: duplicating values within one variable
Date   Mon, 21 Nov 2005 11:47:03 -0000

This is a common problem and the good news is that
you have yet to discover the power of -by:-, the 
most Stataish of all Stata's features. 

You want something like 

gen test = employ if industry == <whatever> 
sort state county year test 
by state county year: replace test = test[1]

or (more concisely) 

gen test = employ if industry == <whatever> 
bysort state county year (test): replace test = test[1]

On the -sort-

sort state country year test 

the observations with non-missing -test- are sorted 
to the first observation within each -state county year- 
combination. Then the others can be replaced by 
the first value in each block. 

A Mickey and Minnie tutorial on -by:- is at

SJ-2-1  pr0004  . . . . . . . . . . Speaking Stata:  How to move step by: step
        Q1/02   SJ 2(1):86-102                                   (no commands)
        explains the use of the by varlist : construct to tackle
        a variety of problems with group structure, ranging from
        simple calculations for each of several groups to more
        advanced manipulations that use the built-in _n and _N

and several FAQs, particularly on data management, given further 


Gregor Franz
> How can I make observations in a variable take on the value of a 
> specific observation within this variable? For example, I 
> have employees 
> by industry in each county in each state for several years. I want to 
> create a variable that is equal to employees in industry = x for each 
> county and state by year for all observations. If I type gen test= 
> employees if industry ==x, I only get one observation each year, by 
> county and state, but I want the rest of the (now missing) 
> obesrvations 
> in variable 'test' (which in the original variable 
> 'employees' take on 
> the values by different industries) to take on the value of 
> industry x. 
> So in the end all observations for the variable 'test'  would take on 
> the value of industry county state and year.

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index