Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: Clustering by school year

 From David Kantor To statalist@hsphsun2.harvard.edu Subject Re: st: Clustering by school year Date Sat, 23 Oct 2010 22:39:16 -0400

```At 10:03 PM 10/23/2010, Jose A wrote:
```
Would clustering by school year be as simple as generating a variable school_year = school identifier * year, and then using this new varialbe as the cluster?
```
```
Just from a practical standpoint, this could work, provided that school_identifier is numeric (preferably an integer). But you also need to assure that the values you get will constitute a one-to-one mapping of school_identifier and year to the resulting number. That is, there should be no distinct pairs of school_identifier and year that map to the same value. Say that you have school_identifiers 200 and 201, and your years are 2000 and 2010. You would have,
```2000 * 201 = 402000
2010 * 200 = 402000
-- thus, a many-to-one mapping.

```
You need to inspect your set of years and school_identifiers to see if something like this would happen.
```
```
If this situation arises, then you need some other scheme. You could extract the unique pairs of school_identifier and year that occur in the data. (Or, if you need to be more general, obtain the sets of years and school_identifier separately; form the cross-product; see -help cross-.) With this set, -gen long clusterid = _n-; save it, and later merge your analysis file to this file.
```
HTH
--David

----
P.S., there is another numeric-based solution: either,
k1 * school_identifier + year
or
k2 * year + school_identifier

where k1 or k2 are strategically chosen constants:
k1 > max(year)
k2 > max(school_identifier)

One possibility is
10000 * school_identifier + year
----

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```