[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: AW: egen(mean or suchlike) for a string variable |

Date |
Thu, 8 Oct 2009 17:23:15 +0100 |

Let's underline that this can all be done with strings. There is no need to resort to -encode- or otherwise to convert to numeric. Missing, i.e. empty, strings sort first. Thus after -input- and -trim()-, Martin's code can be slimmed to bys year Prof (Uni) : replace Uni = Uni[_N] if missing(Uni) -- without any need for an extra variable. However, there is no check here for different non-missing values within groups of -year Prof-. In the same territory, note that -egen, mode()- takes string arguments as well as numeric, so can be used for imputation. However, the direct route that Martin exemplifies has many advantages. Nick n.j.cox@durham.ac.uk Martin Weiss ************* clear* inp year str10(Uni Prof) 1990 Harvard " S Smith" 1990 "" "S Smith" 1990 UCLA "P Williams" 1990 Yale " K John" 1991 "" "K Evert" 1991 Oxford "K Evert" 1991 "" "K Evert" end replace Uni=trim(Uni) replace Prof=trim(Prof) compress gen byte nonmiss=!mi(Uni) //replace with last obs bys year Prof (nonmiss): /* */ replace Uni=Uni[_N] /* */ if nonmiss==0 l, noo sepby(year Prof) ************* joe j Thanks. (Your suggestion helped me create a variable that takes a numeric value, instead of the university name; this is definitely an improvement.) This is how the data looks like: Year University Professor 1990 Harvard S Smith 1990 --------- S Smith 1990 UCLA P Williams 1990 Yale K John 1991 --------- K Evert 1991 Oxford K Evert What I want is to replace the missing names above, in 1990 with Harvard and in 1991 with Oxford. On Thu, Oct 8, 2009 at 11:59 AM, Martin Weiss <martin.weiss1@gmx.de> > You should turn the string into a numeric variable via -encode-. Then -egen- > can go to work. Also provide an excerpt of your data and show what you want > to happen to them... joe j > In my data I have a string variable "University", which lists > university names. In some years the names are missing. Two other > variables I've are "Professor" and "Year". The same "Professor" and > "University" can occur multiple times in a year. > > The problem I have is that there are quite a few University names that > are missing. What I want to do is to replace as many missing > University names as possible, by assuming that: when a professor is > linked to a university at least once in a year, she is linked to the > same university during that year - so the missing university name when > her name occurs again in the same year can be replaced (why there are > missing university names is a complicated story:)). > I tried the following in Stata (it's foolish, I know): > > bysort year professor: egen University_all=mean(University) > > But I get the warning "type mismatch". * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: AW: egen(mean or suchlike) for a string variable***From:*joe j <joe.stata@gmail.com>

**References**:**st: egen(mean or suchlike) for a string variable***From:*joe j <joe.stata@gmail.com>

**Re: st: AW: egen(mean or suchlike) for a string variable***From:*joe j <joe.stata@gmail.com>

**AW: st: AW: egen(mean or suchlike) for a string variable***From:*"Martin Weiss" <martin.weiss1@gmx.de>

- Prev by Date:
**RE: AW: st: Stata Inbuilt commands** - Next by Date:
**RE: st: difference between "Spearman" and "pwcorr / correlate"** - Previous by thread:
**AW: st: AW: egen(mean or suchlike) for a string variable** - Next by thread:
**Re: st: AW: egen(mean or suchlike) for a string variable** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |