Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Clustering by school year

From   David Kantor <[email protected]>
To   [email protected]
Subject   Re: st: Clustering by school year
Date   Sat, 23 Oct 2010 22:39:16 -0400

At 10:03 PM 10/23/2010, Jose A wrote:
Would clustering by school year be as simple as generating a variable school_year = school identifier * year, and then using this new varialbe as the cluster?

Just from a practical standpoint, this could work, provided that school_identifier is numeric (preferably an integer). But you also need to assure that the values you get will constitute a one-to-one mapping of school_identifier and year to the resulting number. That is, there should be no distinct pairs of school_identifier and year that map to the same value. Say that you have school_identifiers 200 and 201, and your years are 2000 and 2010. You would have,
2000 * 201 = 402000
2010 * 200 = 402000
-- thus, a many-to-one mapping.

You need to inspect your set of years and school_identifiers to see if something like this would happen.

If this situation arises, then you need some other scheme. You could extract the unique pairs of school_identifier and year that occur in the data. (Or, if you need to be more general, obtain the sets of years and school_identifier separately; form the cross-product; see -help cross-.) With this set, -gen long clusterid = _n-; save it, and later merge your analysis file to this file.


P.S., there is another numeric-based solution: either,
 k1 * school_identifier + year
 k2 * year + school_identifier

where k1 or k2 are strategically chosen constants:
 k1 > max(year)
 k2 > max(school_identifier)

One possibility is
 10000 * school_identifier + year

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index