Stata | FAQ: Going through groups in order of first occurrence

Home / Resources & support / FAQs / Going through groups in order of first occurrence

How do I go through the groups of a variable in order of their first occurrence in the dataset?

Title		Going through groups in order of first occurrence
Author		Nicholas J. Cox, Durham University, UK

Suppose that you wish to do something for each of several groups of your data but in the order of their first occurrence in your dataset. That stipulation limits the use of levelsof or egen, group(), which ignore current sort order. For concreteness, imagine an example of panel data for which we have an identifier variable id. We want analyses to respect order of first occurrence of id.

For example:

     +----+
     | id |    
     |----|
  1. |  5 |
  2. |  1 |
  3. |  4 |
  4. |  1 |
  5. |  1 |
     |----|
  6. |  2 |
  7. |  2 |
  8. |  2 |
  9. |  4 |
 10. |  4 |
     |----|
 11. |  4 |
 12. |  1 |
 13. |  5 |
 14. |  4 |
 15. |  1 |
     +----+

We have variable id in this initial order. We want to go through all the values of id in the order 5, 1, 4, 2.

Order of occurrence in the data is encapsulated in the set of observation numbers, so we put those in a variable:

        . generate long obs = _n

Now we sort by id, breaking ties by obs. The first observation in each block, defined by a value of id, then carries information on first occurrence. We copy the observation number of first occurrence to each other occurrence of the same id.

        . by id (obs), sort: replace obs = obs[1]

Now we tag identifiers from 1 to whatever, according to first occurrence:

        . by obs, sort: gen byte group = _n == 1
        . replace group = sum(group)

Those familiar with egen, group() may recognize the basic idea here. Now the number of groups is identifiable from

        . summarize group, meanonly 
        . local max = r(max)

Typically, then you loop over groups:

        . forvalues i = 1/`max' {
 
	something for each group 

        . }

There is one common need we should mention. As we cycle over the groups within the loop, we often wish to display the identifier of the current group. Recall that there was a mapping from groups of id according to their order of occurrence in the data to the new variable group, which by construction takes on the integers from 1 and above. For a numeric identifier, a look-up technique within the loop to get the current identifier is

        . summarize id if group == `i', meanonly

All values of id in each group are the same, so it matters little whether we pick up the minimum, the mean, or the maximum. Typing

        . local which = r(min)

will do, for example. However, for a string identifier, we need to work a little harder. Outside the loop, before it starts, we must type

        . replace obs = _n

Inside the loop, we type

        . summarize obs if group == `i', meanonly 
        . local which = id[`r(min)']

That is, as id is a string variable, we cannot feed it to summarize. We must feed the observation numbers to summarize so that we can work out where to look for the identifier string value. (Here and in the previous summarizes, the meanonly option makes calculations as fast as possible.)

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.

How do I go through the groups of a variable in order of their first occurrence in the dataset?

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies

Stata/MP4 Annual License (download)

How do I go through the groups of a variable in order of their first occurrence in the dataset?

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies