[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Martin Weiss" <martin.weiss1@gmx.de> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
AW: st: AW: egen(mean or suchlike) for a string variable |

Date |
Fri, 9 Oct 2009 16:49:54 +0200 |

<> Jeph`s and Eva`s http://www.stata-journal.com/article.html?article=dm0039 may also be useful for Joe... HTH Martin -----Ursprüngliche Nachricht----- Von: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von Nick Cox Gesendet: Freitag, 9. Oktober 2009 16:49 An: statalist@hsphsun2.harvard.edu Betreff: RE: st: AW: egen(mean or suchlike) for a string variable You should check for different spellings etc. Spellings could be inconsistencies in use of upper and lower case, extra leading, internal, or trailing spaces etc. See http://www.stata.com/support/faqs/data/diff.html for some technique for identifying inconsistencies. Nick n.j.cox@durham.ac.uk joe j Thank you, Nick, for complimenting Martin's advice. I do find a slight difference in outcomes from the two procedures (may be about 0.1% in a sample of close to a million, so I can't immediately tell why this is so; perhaps due to the complications you allude). Good to know also about -egen, mode()- On Thu, Oct 8, 2009 at 6:23 PM, Nick Cox <n.j.cox@durham.ac.uk> wrote: > Let's underline that this can all be done with strings. There is no need to resort to -encode- or otherwise to convert to numeric. > > Missing, i.e. empty, strings sort first. Thus after -input- and -trim()-, Martin's code can be slimmed to > > bys year Prof (Uni) : replace Uni = Uni[_N] if missing(Uni) > > -- without any need for an extra variable. > > However, there is no check here for different non-missing values within groups of -year Prof-. > > In the same territory, note that -egen, mode()- takes string arguments as well as numeric, so can be used for imputation. However, the direct route that Martin exemplifies has many advantages. > > Nick > n.j.cox@durham.ac.uk > > Martin Weiss > > ************* > clear* > > inp year str10(Uni Prof) > 1990 Harvard " S Smith" > 1990 "" "S Smith" > 1990 UCLA "P Williams" > 1990 Yale " K John" > 1991 "" "K Evert" > 1991 Oxford "K Evert" > 1991 "" "K Evert" > end > > replace Uni=trim(Uni) > replace Prof=trim(Prof) > compress > > gen byte nonmiss=!mi(Uni) > > //replace with last obs > bys year Prof (nonmiss): /* > */ replace Uni=Uni[_N] /* > */ if nonmiss==0 > > l, noo sepby(year Prof) > ************* > > joe j > > Thanks. (Your suggestion helped me create a variable that takes a > numeric value, instead of the university name; this is definitely an > improvement.) > > This is how the data looks like: > > Year University Professor > > 1990 Harvard S Smith > 1990 --------- S Smith > 1990 UCLA P Williams > 1990 Yale K John > > 1991 --------- K Evert > 1991 Oxford K Evert > > What I want is to replace the missing names above, in 1990 with > Harvard and in 1991 with Oxford. > > On Thu, Oct 8, 2009 at 11:59 AM, Martin Weiss <martin.weiss1@gmx.de> > >> You should turn the string into a numeric variable via -encode-. Then > -egen- >> can go to work. Also provide an excerpt of your data and show what you > want >> to happen to them... > > joe j > >> In my data I have a string variable "University", which lists >> university names. In some years the names are missing. Two other >> variables I've are "Professor" and "Year". The same "Professor" and >> "University" can occur multiple times in a year. >> >> The problem I have is that there are quite a few University names that >> are missing. What I want to do is to replace as many missing >> University names as possible, by assuming that: when a professor is >> linked to a university at least once in a year, she is linked to the >> same university during that year - so the missing university name when >> her name occurs again in the same year can be replaced (why there are >> missing university names is a complicated story:)). > >> I tried the following in Stata (it's foolish, I know): >> >> bysort year professor: egen University_all=mean(University) >> >> But I get the warning "type mismatch". > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: egen(mean or suchlike) for a string variable***From:*joe j <joe.stata@gmail.com>

**Re: st: AW: egen(mean or suchlike) for a string variable***From:*joe j <joe.stata@gmail.com>

**RE: st: AW: egen(mean or suchlike) for a string variable***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

**Re: st: AW: egen(mean or suchlike) for a string variable***From:*joe j <joe.stata@gmail.com>

**RE: st: AW: egen(mean or suchlike) for a string variable***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

- Prev by Date:
**RE: st: AW: egen(mean or suchlike) for a string variable** - Next by Date:
**Re: st: Issues of Normality for Multiple Imputation with ICE** - Previous by thread:
**RE: st: AW: egen(mean or suchlike) for a string variable** - Next by thread:
**Re: st: AW: egen(mean or suchlike) for a string variable** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |