Pooja Gupta <pgupta@worldbank.org> wrote,
> I am doing quantitative analyses of a set of variables using stata. one of
> my variables has multiple alphanumeric characters that are not seperated by
> commas. for eg, the first five observations of the variable are
>
> 1. ABC
> 2. ABCEG
> 3. BDEGHI
> 4. ACDFGI
> 5. AHI
>
> can a write a code which allows me to do a tabulation of each of these
> alphabets (i.e., how many As, how many B, how many C and so on) ?
Tom Steichen <steicht@rjrt.com> has supplied one answer and that may be
exactly what Pooja wants.
Alternatively, perhaps Pooja would obtain what he wants if he had a dataset in
which the first three obsrvations were "A", "B", and "C", the next five
observations "A", "B", "C", "E", "G", and so on. Then Pooja could just type
-tabulate response-.
Let us assume that the variable containing "ABC", "ABCEG", ..., in Pooja's
current data is called answers. We will also assume that Pooja has
a variable named id in the dataset which uniquely identifies the observations.
Step 1: determine maximum length of answers
--------------------------------------------
. gen length = length(answers)
. summarize length
I will assume that the maximum value of length is 9 in what follows.
Step 2: Turn variable answers into nine variables, response1 -- response9
--------------------------------------------------------------------------
. forvalues i=1(1)9 {
. gen str1 response`i' = substr(answer, `i', 1)
. }
Step 3: Convert data from wide form to long
--------------------------------------------
. reshape long response, i(id)
Step 4: Perform tabulation
---------------------------
. tabulate response
-- Bill
wgould@stata.com
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/