Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: RE: st: Re: Creating a unique identifier from a string and byte variable


From   "Gauri Khanna" <[email protected]>
To   [email protected]
Subject   Re: RE: st: Re: Creating a unique identifier from a string and byte variable
Date   Tue, 03 Apr 2007 10:48:48 +0000

Dear Mr. Cox,

Thank you for your response.

I just looked at the manual again and I had mistakenly placed the commas as you correctly pointed out. So the -concat- option works.

The -group- option, however, is not running through as my computer stalls(it has been giving me problems). For the present moment, I will stick with the -concat- command.

Regards,

Gauri




From: n j cox <[email protected]>
Reply-To: [email protected]
To: [email protected]
Subject: Re: RE: st: Re: Creating a unique identifier from a string and byte variable
Date: Tue, 03 Apr 2007 10:54:37 +0100

Various problems here:

1. Sergiy advised using -string(caseid)-. I think he meant
-real(caseid)-.

2. Gauri had syntax errors with -egen, concat()-:

>egen idchild = concat(caseid, bidx)
>invalid syntax
>r(198);

>. egen idchild = concat(childcase, bidx)
>invalid syntax
>r(198);

-egen, concat()- is designed to be able to handle both string
and numeric variables. That is half the point. The help makes
this explicit:

------------------------------------ extract from -egen- help
concat(varlist) [, format(%fmt) decode maxlength(#) punct(pchars)]
may not be combined with by. It concatenates varlist to produce a
string variable. Values of string variables are unchanged. Values of
numeric variables are converted to string as is, or converted using a
format under option format(%fmt), or decoded under option decode, in
which case maxlength() may also be used to control the maximum label
length used. By default, variables are added end-to-end:
punct(pchars) may be used to specify punctuation, such as a space,
punct(" "), or a comma, punct(,).
------------------------------------

So, mixing string and numeric variables is never an issue for -egen,
concat()-. The real problem is different. The argument of -concat()-
is a varlist. Gauri has commas inside a varlist. That is illegal.
The logic is simple: Often you want to type

<command> <varlist> , <options>

Stata has to be clear where the varlist finishes and the options begin.
Hence the rule: no commas in varlists. See help on -varlist-.

Now, you might say, "Oh well, -egen- should know that I couldn't possibly want an option in that place." Well, programmers could
have indulged you here, at some cost in complexity of code, but
I didn't when I first wrote -concat()-
and StataCorp followed in this regard when they adopted it.
And once you start letting people getting away with commas in
varlists in some places, they just get confused.

3. Another problem is that Gauri is overlooking advice.
-search identifier, faq- points clearly to an FAQ and that
in turn recommends -egen, group()- as a solution to this need.

FAQ . . . . . . . . . . . . . . . . . . . . . . Creating group identifiers
3/01 How do I create individual identifiers numbered
from 1 upwards?
http://www.stata.com/support/faqs/data/group.html

Nick
[email protected]

Gauri Khanna
------------------------------------------------------------------------
Thanks for the prompt reply. I tried both options, and the first one worked.
I checked for -duplicates report my_id- and there are none, so indeed a
combination of "caseid" and "bidx" works. So my problem is solved but I
still wanted to show what went wrong with the second command.

The second option
>In your example bidx is somewhere between 0 and 999, so one can:
>gen my_id=string(caseid)*1000+bidx
>which will create a number identifier
>You must check that caseid*1000 still can be stored completely (without
>loss) in Stata.gave the following error :

gen my_id=string(caseid)*1000+bidx
type mismatch
r(109);

I don't think it is possible to multiply a number with a string variable.
(Variable bidx only has two values, 1 and 2.)
------------------------------------------------------------------------

Sergiy Radyakin
-------------------------------------------------------------------------
>you can always create a string identifier from variables of different
>types.
>In your example:
>gen my_id=caseid+"#"+string(bidx)
>Notice that symbol "#" separates the two sources, which resolves the
>frequent
>problem:
>caseid=123 bidx=4 => my_id=1234
>caseid=12 bidx=34 => my_id=1234
>Use any symbol instead of "#" which is not in your identifiers.
>
>Another technique can be used when one of the ids is of low dimension.
>In your example bidx is somewhere between 0 and 999, so one can:
>gen my_id=string(caseid)*1000+bidx
>which will create a number identifier
>You must check that caseid*1000 still can be stored completely (without
>loss)
>in Stata.
>
>Notice that the identifiers will be "unique" (as you requested) only if a
>combination
>of caseid and bidx is unique.
>
>Hint: use -compress- to reduce the types to simplier ones, e.g. Long-->Byte
>(if possible).
--------------------------------------------------------------------------

Gauri Khanna
--------------------------------------------------------------------------
>>I am using cross sectional data with around 31,000 observations. I would
>>like to create a unique identifier called "idchild" composed of two
>>variables: caseid(string variable) and bidx(byte). I have described the
>>variables below and listed them as well (observations, 29 & 30, 37 & 38,
>>960 & 961 have the same caseid's but different bidx's).
>>
>>des caseid
>>
>> storage display value
>>variable name type format label variable label
>>-------------------------------------------------------------------------------
>>caseid str15 %15s case identification
>>
>>. des bidx
>>
>> storage display value
>>variable name type format label variable label
>>-------------------------------------------------------------------------------
>>bidx byte %8.0g birth column number
>>
>>. list caseid bidx
>>
>> +------------------------+
>> | caseid bidx |
>> |------------------------|
>> 1. | 2 1 66 4 1 |
>> 2. | 2 1 66 7 1 |
>> 3. | 2 1 93 7 1 |
>> 4. | 2 1 111 4 1 |
>> 5. | 2 1 147 2 1 |

< snip NJC>

>> 959. | 2116 52 4 1
>> 960. | 2116 56 4 1
>> |------------------------------------|
>> 961. | 2116 56 4 2
>> 962. | 2116 60 3 1
>> 963. | 2116 84 4 1
>> 964. | 2116 112 4 1
>> 965. | 2117 10 2 1
>> |------------------------------------|
>> 966. | 2117 26 5 1
>> 967. | 2117 50 5 1
>> 968. | 2117 54 8 1
>> 969. | 2117 58 4 1
>> 970. | 2117 62 3 1
>> |------------------------------------|
>> 971. | 2117 62 3 2
>> 972. | 2117 86 5 1
>> 973. | 2117 86 6 1
>> 974. | 2117 130 2 1
>> 975. | 2117 134 2 1
>>--Break--
>>
>>
>>I tried the following :
>>
>>egen idchild = concat(caseid, bidx)
>>invalid syntax
>>r(198);
>>
>>I realise that I am trying to concatenate two different *types* of
>>variables and so I then tried the following:
>>
>>decode bidx, gen(childbidx)
>>bidx not labeled
>>r(182);
>>
>>Then I tried changing the caseid variable:
>>
>>encode caseid, gen(childcase)
>>
>>. des childcase
>>
>> storage display value
>>variable name type format label variable label
>>-------------------------------------------------------------------------------
>>childcase long %15.0g childcase
>> case identification
>>
>>. egen idchild = concat(childcase, bidx)
>>invalid syntax
>>r(198);
>>
>>How can I create a unique idchild? I am using Stata 9.2.
------------------------------------------------------------------------------
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index