Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Creating an id variable from one of each (string) observations in 6 variables


From   Nick Cox <[email protected]>
To   "[email protected]" <[email protected]>
Subject   Re: st: Creating an id variable from one of each (string) observations in 6 variables
Date   Thu, 20 Mar 2014 10:17:38 +0000

This could mean various things.

I recommend against the word "unique" here. "Unique" means occurring
once only according to even permissive dictionaries and style guides.
Despite that, people using software often use "unique" to mean
"distinct", but I'd argue in that case for the latter word.

My guess is that some of this usage can be attributed to the Unix
command -uniq-, which reduces a set of values to a subset in which
each value occurs just once.

For more on this point, and more positively a bundle of related ideas, see

SJ-8-4  dm0042  . . . . . . . . . . . .  Speaking Stata: Distinct observations
        (help distinct if installed)  . . . . . .  N. J. Cox and G. M. Longton
        Q4/08   SJ 8(4):557--568
        shows how to answer questions about distinct observations
        from first principles; provides a convenience command

http://www.stata-journal.com/sjpdf.html?articlenum=dm0042

and the -distinct- command it introduces.

Also, see -groups- (SSC), the tabulation command -tabm- in -tab_chi-
(SSC), -mrtab- (SJ) and the -egen- function -group()-. For example,

egen which = group(ll????)

will assign identical responses to the six questions to the same
identifier value.

Nick
[email protected]


On 20 March 2014 10:05, Jonas Klarin <[email protected]> wrote:
> Dear all,
>
> I have election survey data from six periods in time. In each point in time, a couple of thousand different people answered a question. The answers are stored in six string variables as text. I would like to create an id variable containing each unique answer from the six variables and then count the number of times each unique answer is recorded for every time period. In other words, I would like to know how many times the respondents replied eg. Syssels% for every time period (variable). Can someone help me with this?
>
> The data looks like this (IIxxxx are the variable names):
>
> II1994          II1998          II2002          II2006          II2010          II1991
> Syssels‰        Syssels‰        .                       Syssels‰        .                       .
> Familjep                KulturfrÂgor    .                       .                       .                       .
> Sveriges                SjukvÂrd                H‰lso- o                Miljˆ/miljˆv            .                       .
> .                       .                       SjukvÂrd/sju    .                       .                       .
> .                       .                       SjukvÂrd                ƒldrevÂr                Skatter/arbe    .
> …                       …                       …                       …                       …                       …
>
> etc..
>
>
> Kind regards,
> Jonas Klarin
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index