[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: string variable

From   "Austin Nichols" <>
Subject   Re: st: string variable
Date   Tue, 13 Nov 2007 09:49:01 -0500

There are several applications, e.g. -xtreg, i(id)-, where a numeric
id is required (for no apparent reason, but required nonetheless).
Why we cannot simply:
  egen g=grou(id)
and keep numeric and string identifiers is not clear, perhaps, but
suppose we want:
  list g
to produce correct-looking identifiers, for whatever reason.  Then the
idea of my posted approach is correct, though the details are
not--there is a missing -if- condition and -labmask- will not work
here.  But a solution from first principles is easy,  I think:

loc N 500
set obs `N'
g id=string(_n)
replace id=id+char(_n) in 65/90
codebook id
*-encode- won't work if N too great
*encode id, gen(numid)
*(nor will -labmask- apparently)
gen numid=real(id)
gen strid=id if mi(numid)
egen g=group(strid)
su numid, meanonly
replace numid=r(max)+g if mi(num)
levelsof strid, loc(vals)
foreach v of loc vals {
 su numid if strid=="`v'", meanonly
 la def numid `r(max)' "`v'", modify
la val numid numid
codebook numid

On 11/13/07, Nick Cox <> wrote:
> Austin is right that -egen, group()- will assign integers
> 1 up. But if -encode- won't play at assigning labels because
> there are too many distinct values, then I don't think -labmask-
> (or even -egen, group()- with the -label- option) will help
> either.
> I am still puzzled at the original question. On the face of
> it the variable in question is some kind of identifier. It
> is difficult to see any sense in which it is better off as
> a numeric variable. If there are thousands of distinct values
> it would be no use for any kind of modelling, so far as I can imagine.
> Nick
*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index