Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: removing characters from string-formatted variables mixed in with numeric-formatted variables


From   "Cohen, Elan" <cohened@upmc.edu>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   st: RE: removing characters from string-formatted variables mixed in with numeric-formatted variables
Date   Fri, 22 Jun 2012 15:28:20 +0000

Doug,

I believe the following one-liner should work for you:

destring *, replace ignore("'")

HTH,

- Elan


-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Doug Hess
Sent: Friday, June 22, 2012 11:03
To: statalist@hsphsun2.harvard.edu
Cc: Doug Hess
Subject: st: removing characters from string-formatted variables mixed in with numeric-formatted variables

Hello,

I imported into Stata from text files a data set of survey responses
for a large national survey. Many of the variables have single quotes
around numeric values. For instance, a variable may include the values
'-9', '1', '2' instead of simply -9, 1, 2.  However, not every
variable includes these characters for numeric values. (Not sure why!)
Thus, Stata formats some variables as string and some as numeric
during the import (using the import "text data from a spreadsheat"
menu). However, the order of the variables is not strings first,
numeric second. It's all hodgepodge.

I want to remove all the stray single quote marks. So, after poking
around on Statalist I tried using the -replace- command, the
-subinstr- function, and a loop:

local abc = "control bedrms region smsa metro3 lmed lmeda lmedb fmr"
/* Note I truncated this list, there are dozens of variables in the
dataset I wish to clean up. */
    foreach varname of local abc {
	replace `varname'=subinstr(`varname',"'","",.)
	destring `varname', replace
	}

However, this loop stops when it runs into a variable formatted as
numeric. Given that there are dozens of these variables, I don't want
to use the -order- command one by one to put the string variables
first (or last). Is there a way to use the format of the variables
with -if- to limit the -order- command or -replace- command? Or other
ideas?

Thank you. (Note: I subscribe to the list's digest mode, so cc'ing me
on any responses would be helpful.)

Doug
douglasrhess@gmail.com
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index