Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Data management question


From   Michael Hanson <[email protected]>
To   [email protected]
Subject   Re: st: Data management question
Date   Thu, 16 Jul 2009 10:11:26 -0400

I agree with Nick -- who beat me to the punch on this very recommendation. Some slight additional value-added: -trim()- will not remove spaces within a string, only at the beginning and end, and Nick's suggestion below will count those spaces as if they too were "X"s. To make your count variable (which I call "n") robust to such a possibility, you could instead use:

gen n = length(subinstr(x, " ", "", .))

where "x" is your original string variable. Ideally, you can "robustify" this approach further by using regular expression functions, but I have not yet determined a clean way to remove _all_ possible non-letter characters ([:space:] characters in regex) -- at best, -regexr()- only replaces the _first_ occurrence of a [:space:] character with a blank, not all of them. Suggestions on this point would be welcome.

Also, if you want a truly robust solution, you should decide whether non-"X" letter characters should be counted or not. Again, clever use of a regex expression ought to help with this, but personally I have not figured out how to get Stata to deliver on this potential.

Hope this helps,
Mike


On Jul 16, 2009, at 8:39 AM, Nick Cox wrote:

As a footnote, observe that if the issue is counting "X" in values of
-kisses- such as "XX", "XXXXX", etc. then

gen nkisses = length(kisses)

will be a more direct and efficient solution so long as no other
characters are observed. -length(trim(kisses))- will protect against
accidental leading and trailing spaces.

Nick
[email protected]

Susan Olivia

Thanks Tirthankar,

Using Nick Cox's command is way more efficient. My earlier
attempts were very inefficient.

Tirthankar Chakravarty

There are probably many ways of doing this, but here is
way using Nick Cox's -egenmore- (SSC, Nick Winter is
attributed as the author of the -noccur()- function used
here) package:

clear*
set obs 100
g crosses = " "
local cross "x"
forv i=1/ 100 {
      qui: replace crosses="`cross'" in `i'
      local cross "`cross'x"
}
// ssc install egemore, replace
egen noccur = noccur(crosses), string("x")
su noccur

On Thu, Jul 16, 2009 at 2:26 AM, Susan
Olivia<[email protected]>

I have a variable (say number of days) and is a string
variable. This variable is represented by XXXX
(basically the number of X denotes the number of days).

I would like to create a numeric value for this variable
(i.e. 4 crosses = 4) . Is there a way I can easily do
this in Stata? I tried the 'encode' and 'destring'
commands, but these commands didn't do what I after.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index