Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: string variable


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   RE: st: string variable
Date   Tue, 13 Nov 2007 15:10:09 -0000

I agree that -egen, group()- will get you numeric identifiers 
even if you have to give up on the labels. Thanks for your information
on -xtreg-, which raises a question for StataCorp: why this insistence? 

The issue with -encode- is a limit on the number of labels allowed. 
That limit bites whatever side you try to scale the mountain from. 

Nick 
[email protected] 

Austin Nichols

Nick--
There are several applications, e.g. -xtreg, i(id)-, where a numeric
id is required (for no apparent reason, but required nonetheless).
Why we cannot simply:
  egen g=grou(id)
and keep numeric and string identifiers is not clear, perhaps, but
suppose we want:
  list g
to produce correct-looking identifiers, for whatever reason.  Then the
idea of my posted approach is correct, though the details are
not--there is a missing -if- condition and -labmask- will not work
here.  But a solution from first principles is easy,  I think:

clear
loc N 500
set obs `N'
g id=string(_n)
replace id=id+char(_n) in 65/90
codebook id
*-encode- won't work if N too great
*encode id, gen(numid)
*(nor will -labmask- apparently)
gen numid=real(id)
gen strid=id if mi(numid)
egen g=group(strid)
su numid, meanonly
replace numid=r(max)+g if mi(num)
levelsof strid, loc(vals)
foreach v of loc vals {
 su numid if strid=="`v'", meanonly
 la def numid `r(max)' "`v'", modify
 }
la val numid numid
codebook numid


On 11/13/07, Nick Cox <[email protected]> wrote:
> Austin is right that -egen, group()- will assign integers
> 1 up. But if -encode- won't play at assigning labels because
> there are too many distinct values, then I don't think -labmask-
> (or even -egen, group()- with the -label- option) will help
> either.
>
> I am still puzzled at the original question. On the face of
> it the variable in question is some kind of identifier. It
> is difficult to see any sense in which it is better off as
> a numeric variable. If there are thousands of distinct values
> it would be no use for any kind of modelling, so far as I can imagine.
>
> Nick
> [email protected]
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index