[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: string variable

Subject   RE: st: string variable
Date   Tue, 13 Nov 2007 18:10:40 +0100

Thanks Nick and Austin.
I will try the way that Austin suggest.

Anyway id is an important variable when you use panel data because you need a numeric variable.
iis(id) to set your panel.
Thanks again.

Quoting Nick Cox <>:

I agree that -egen, group()- will get you numeric identifiers
even if you have to give up on the labels. Thanks for your information
on -xtreg-, which raises a question for StataCorp: why this insistence?

The issue with -encode- is a limit on the number of labels allowed.
That limit bites whatever side you try to scale the mountain from.


Austin Nichols

There are several applications, e.g. -xtreg, i(id)-, where a numeric
id is required (for no apparent reason, but required nonetheless).
Why we cannot simply:
  egen g=grou(id)
and keep numeric and string identifiers is not clear, perhaps, but
suppose we want:
  list g
to produce correct-looking identifiers, for whatever reason.  Then the
idea of my posted approach is correct, though the details are
not--there is a missing -if- condition and -labmask- will not work
here.  But a solution from first principles is easy,  I think:

loc N 500
set obs `N'
g id=string(_n)
replace id=id+char(_n) in 65/90
codebook id
*-encode- won't work if N too great
*encode id, gen(numid)
*(nor will -labmask- apparently)
gen numid=real(id)
gen strid=id if mi(numid)
egen g=group(strid)
su numid, meanonly
replace numid=r(max)+g if mi(num)
levelsof strid, loc(vals)
foreach v of loc vals {
 su numid if strid=="`v'", meanonly
 la def numid `r(max)' "`v'", modify
la val numid numid
codebook numid

On 11/13/07, Nick Cox <> wrote:
Austin is right that -egen, group()- will assign integers
1 up. But if -encode- won't play at assigning labels because
there are too many distinct values, then I don't think -labmask-
(or even -egen, group()- with the -label- option) will help

I am still puzzled at the original question. On the face of
it the variable in question is some kind of identifier. It
is difficult to see any sense in which it is better off as
a numeric variable. If there are thousands of distinct values
it would be no use for any kind of modelling, so far as I can imagine.

*   For searches and help try:

*   For searches and help try:

Catia Nicodemo

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index