Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Turning text pages into indicators


From   Jen Zhen <jenzhen99@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   st: Turning text pages into indicators
Date   Wed, 8 Aug 2012 15:01:18 +0200

Dear Statalisters,

(1) I'd like to create a list of indicators to cover whether a string
variable contains at least one out of several words.
I know I can check whether it contains one specific word with - gen
indicator=regexm(string,"word1") - but can I also cover several words
in one command line with this?
I tried - gen indicator=regexm(string,"word1" "word2") - and  gen
indicator=regexm(string,"word1" | "word2") - and these wouldn't work,
but maybe there's another way to do this?
I know I can as well generate a separate indicator for each word and
then just sum them up, but since I have many words and many strings to
cover that would be inefficient.

(2) I'm starting with long texts, think half a page or a full page, so
I presumably can't read the entire page into a single string variable
on which I can then perform (1) above.
Do I need to initially split the text in say Excel, or is there a way
to still read all text in in Stata and then split it into as many
variables as necessary (but no more)?

Thanks so much and best regards,
JZ
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index