[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: AW: egen(mean or suchlike) for a string variable

From   "Nick Cox" <>
To   <>
Subject   RE: st: AW: egen(mean or suchlike) for a string variable
Date   Thu, 8 Oct 2009 17:23:15 +0100

Let's underline that this can all be done with strings. There is no need to resort to -encode- or otherwise to convert to numeric. 

Missing, i.e. empty, strings sort first. Thus after -input- and -trim()-, Martin's code can be slimmed to 

bys year Prof (Uni) : replace Uni = Uni[_N] if missing(Uni) 

-- without any need for an extra variable. 

However, there is no check here for different non-missing values within groups of -year Prof-. 

In the same territory, note that -egen, mode()- takes string arguments as well as numeric, so can be used for imputation. However, the direct route that Martin exemplifies has many advantages. 


Martin Weiss


inp year str10(Uni Prof)
1990  Harvard   " S Smith"
1990   ""      "S Smith"
1990  UCLA      "P Williams"
1990  Yale       " K John"
1991   ""        "K Evert"
1991  Oxford     "K Evert"
1991  "" 	"K Evert"

replace Uni=trim(Uni)
replace Prof=trim(Prof)

gen byte nonmiss=!mi(Uni)

//replace with last obs
bys year Prof (nonmiss): /* 
*/ replace Uni=Uni[_N]  /* 
*/ if nonmiss==0

l, noo sepby(year Prof)

joe j

Thanks. (Your suggestion helped me create a variable that takes a
numeric value, instead of the university name; this is definitely an

This is how the data looks like:

Year  University Professor

1990  Harvard    S Smith
1990   ---------     S Smith
1990  UCLA      P Williams
1990  Yale        K John

1991   ---------    K Evert
1991  Oxford     K Evert

What I want is to replace the missing names above, in 1990 with
Harvard and in 1991 with Oxford.

On Thu, Oct 8, 2009 at 11:59 AM, Martin Weiss <> 

> You should turn the string into a numeric variable via -encode-. Then
> can go to work. Also provide an excerpt of your data and show what you
> to happen to them...

joe j

> In my data I have a string variable "University", which lists
> university names. In some years the names are missing. Two other
> variables I've are "Professor" and "Year". The same "Professor" and
> "University" can occur multiple times in a year.
> The problem I have is that there are quite a few University names that
> are missing. What I want to do is to replace as many missing
> University names as possible, by assuming that: when a professor is
> linked to a university at least once in a year, she is linked to the
> same university during that year - so the missing university name when
> her name occurs again in the same year can be replaced (why there are
> missing university names is a complicated story:)).

> I tried the following in Stata (it's foolish, I know):
>  bysort year professor: egen University_all=mean(University)
> But I get the warning "type mismatch".

*   For searches and help try:

© Copyright 1996–2015 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index