Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: stacking unique values of several variables under one new variable


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: stacking unique values of several variables under one new variable
Date   Mon, 25 Feb 2013 08:44:37 +0000

For "unique" read "distinct".

My code is very similar to Maarten's but I will post it nevertheless.

If it's as simple as your example implies then you can do this:

. gen long obs = _n

. split technology , p(,)
variables created as string:
technology1  technology2

. local k = r(nvars)

. expand `k'
(4 observations created)

. forval j = 1/`k' {
  2.     bysort obs : replace technology = technology`j'[1] if _n == `j'
  3. }
(2 real changes made)
(4 real changes made)

. drop if missing(technology)
(2 observations deleted)

. replace technology = trim(technology)
(2 real changes made)

. drop technology?

. duplicates drop technology, force

Duplicates in terms of technology

(1 observation deleted)

. list

     +-------------------+
     |  technology   obs |
     |-------------------|
  1. | Monoclonals     1 |
  2. |    Vaccines     2 |
  3. |    Adjuvant     3 |
  4. |     Vaccine     3 |
  5. |  Combinchem     4 |
     +-------------------+

Here's the code in one

gen long obs = _n
split technology , p(,)
local k = r(nvars)
expand `k'
forval j = 1/`k' {
    bysort obs : replace technology = technology`j'[1] if _n == `j'
}
drop if missing(technology)
replace technology = trim(technology)
drop technology?
duplicates drop technology, force
list

Notes: Knowing that "Vaccines" and "Vaccine" mean the same, and
anything similar, will have to be part of extra code.

Maarten's code assumes that the separator is always ", ". I don't
assume that there is a space always, so I am obliged to trim spaces
afterwards.

Nick

On Mon, Feb 25, 2013 at 6:15 AM, James Bernard <jamesstatalist@gmail.com> wrote:

> I have been struggling with the following. I would appreciate you help
>
> I have a variable ("Technology) that indicates type(s) of a technology
> for each record. I want to aggregate the unique values of this
> variable under one new variable, say, called "Type:
>
>
> Technology
> -------------------------
> Monoclonals
> Vaccines
> Adjuvant, Vaccine
> Combinchem, Monoclonals
>
>
>
>
>
> Now, i want to create a variable that stores unique values:
>
> Type
> -----------
> Monoclonals
> Vaccines
> Adjuvant,
> Combinchem
>
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index