[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: RE: levelsof for many categories without sorting |

Date |
Wed, 7 Sep 2005 15:44:02 +0100 |

See also Roger Newson's -sencode- on SSC, which is designed for an overlapping problem. Nick n.j.cox@durham.ac.uk Nick Cox > Note for anyone interested: > > -levelsof- as implemented in Stata 9 differs > subtly from -levels- as added to Stata 8 > during its lifetime. > > That aside, I am very surprised at Iwan's > report that -levelsof- reports categories > according to their order of occurrence in the data. > That contradicts not just the help file, but > also the code as I read it (and for that matter > as I wrote it, originally). StataCorp would like > to see evidence, I am sure. I suspect Iwan's > impression is mistaken, but I am not sure why > it arises. > > The general problem to which -levelsof- is > one solution is discussed in > > http://www.stata.com/support/faqs/data/foreach.html > > A fairly general strategy for going through all > possible levels > > * according to their order of first occurrence > * in the data > > is as follows. > (This circumvents problems arising when -levelsof- > cannot cope.) > > Suppose we have an identifier, say -id-. > > First generate an observation number: > > gen long obs = _n > > Now we sort by -id-, breaking ties by > -obs-. The first observation in each block > then carries information on first occurrence. > We copy the observation number of first > occurrence to each other occurrence of the same id. > > bysort id (obs) : replace obs = obs[1] > > Now we tag ids from 1 to whatever, according > to first occurrence: > > bysort obs : gen group = _n == 1 > replace group = sum(group) > > Those familiar with -egen, group()- may > recognise the basic idea here. > > Now the number of groups is identifiable from > > su group, meanonly > local max = r(max) > > Typically then you loop over groups: > > forval i = 1/`max' { > ... > } > > Within that loop, a look-up technique to > get the identifier concerned is, for > a numeric identifier: > > su id if group == `i', meanonly > > All identifiers in each group are the same, > so it matters little whether we pick up > the minimum, the mean or the maximum: > > local which = r(min) > > will do, for example. > > If the identifier -id- is a string variable, a little > more work is needed. Outside the loop, > > replace obs = _n > > Inside the loop, > > su obs if group == `i', meanonly > local which = id[`r(min)'] > > Nick > n.j.cox@durham.ac.uk > > Barankay, Iwan > > > > I find the command "levelsof" very useful to cut down the > > time on loops when I run through the category of a variable > > (e.g. the location_ids of a large survey). > > > > What I also like is that the local macro generated by > > levlesof is - so it seams to me - still in the order in which > > it appears in the data and does not sort it which is needed > > at times (even though the hlp file of levelsof says > > otherwise). When usually a list is entered into a local it is > > then sorted. > > > > The problem of course is that there are constraints on > > levelsof when it hits the character limit. > > > > My question is: > > > > What can I use instead of levelsof for (i) a large number of > > categories to avoid the character constraint but which (ii) > > also keeps the categories in the order it appears in the data > > and does not sort it. > > > > (i) is much more important than (ii) but if someone did an > > elegant solution for (ii) I would love to hear of it. * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**Fw: st: Calculating the e^b using listcoef** - Next by Date:
**RE: st: Calculating the e^b using listcoef** - Previous by thread:
**st: Stata questions and statistical questions** - Next by thread:
**st: String problem.** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |