[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: levelsof for many categories without sorting |

Date |
Wed, 7 Sep 2005 15:12:16 +0100 |

Note for anyone interested: -levelsof- as implemented in Stata 9 differs subtly from -levels- as added to Stata 8 during its lifetime. That aside, I am very surprised at Iwan's report that -levelsof- reports categories according to their order of occurrence in the data. That contradicts not just the help file, but also the code as I read it (and for that matter as I wrote it, originally). StataCorp would like to see evidence, I am sure. I suspect Iwan's impression is mistaken, but I am not sure why it arises. The general problem to which -levelsof- is one solution is discussed in http://www.stata.com/support/faqs/data/foreach.html A fairly general strategy for going through all possible levels * according to their order of first occurrence * in the data is as follows. (This circumvents problems arising when -levelsof- cannot cope.) Suppose we have an identifier, say -id-. First generate an observation number: gen long obs = _n Now we sort by -id-, breaking ties by -obs-. The first observation in each block then carries information on first occurrence. We copy the observation number of first occurrence to each other occurrence of the same id. bysort id (obs) : replace obs = obs[1] Now we tag ids from 1 to whatever, according to first occurrence: bysort obs : gen group = _n == 1 replace group = sum(group) Those familiar with -egen, group()- may recognise the basic idea here. Now the number of groups is identifiable from su group, meanonly local max = r(max) Typically then you loop over groups: forval i = 1/`max' { ... } Within that loop, a look-up technique to get the identifier concerned is, for a numeric identifier: su id if group == `i', meanonly All identifiers in each group are the same, so it matters little whether we pick up the minimum, the mean or the maximum: local which = r(min) will do, for example. If the identifier -id- is a string variable, a little more work is needed. Outside the loop, replace obs = _n Inside the loop, su obs if group == `i', meanonly local which = id[`r(min)'] Nick n.j.cox@durham.ac.uk Barankay, Iwan > > I find the command "levelsof" very useful to cut down the > time on loops when I run through the category of a variable > (e.g. the location_ids of a large survey). > > What I also like is that the local macro generated by > levlesof is - so it seams to me - still in the order in which > it appears in the data and does not sort it which is needed > at times (even though the hlp file of levelsof says > otherwise). When usually a list is entered into a local it is > then sorted. > > The problem of course is that there are constraints on > levelsof when it hits the character limit. > > My question is: > > What can I use instead of levelsof for (i) a large number of > categories to avoid the character constraint but which (ii) > also keeps the categories in the order it appears in the data > and does not sort it. > > (i) is much more important than (ii) but if someone did an > elegant solution for (ii) I would love to hear of it. > * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: "RE: estout and test** - Next by Date:
**st: Stata questions and statistical questions** - Previous by thread:
**st: "RE: estout and test** - Next by thread:
**st: Stata questions and statistical questions** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |