Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: Use 2 variables to gen 10 new variables

From	daniel klein <[email protected]>
To	[email protected]
Subject	Re: st: RE: Use 2 variables to gen 10 new variables
Date	Thu, 28 Jul 2011 11:22:26 +0200

Jonathan,

this still seems to be the same problem as in
http://www.stata.com/statalist/archive/2011-07/msg00868.html and
earlier in http://www.stata.com/statalist/archive/2011-07/msg00718.html.

Nick has already pointed out, that this hole thing seems very ad hoc,
and I guess it is very error-prone, as I mentioned before. I think you
really need to think about (i) the concepts of a dataset, variables,
observations, frequencies, and, of course, (ii) the underlying
problem. If you tell us exactly _why_ you want to do, what you are
asking for, someone might xome up with a more convenient way to do it.

I would like to demostrate waht I mean by "think about concepts of
variables and freqencies" and "error-prone". Consider your own example

. tab  q5a

       q5a |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |         12       *         *
          2 |         72       *         *
          3 |         29       *         *
          4 |         22       *         *
          5 |         67       *         *

------------+-----------------------------------
      Total |        202      100.00


. tab  q5b

      q5b |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |         22        *       *
          2 |        109        *       *
          4 |         37        *       *
          5 |         18        *       *

------------+-----------------------------------
      Total |        186       100.00


Putting these into one variable, holding only two values, as you want
to do, you will get

. tab  new_q5_1

 new_q5_1 |      Freq.     Percent        Cum.
------------+-----------------------------------
          12 |       1       *       *
          22 |       1       *       *
------------+-----------------------------------
      Total |       2       100.00

. tab  new_q5_3

 new_q5_3 |      Freq.     Percent        Cum.
------------+-----------------------------------
            0 |       1       *       *
          29 |       1       *       *
------------+-----------------------------------
      Total |       2       100.00

As you see, since Stata sorts the values when tabulating, in new_q5_1
the first row will correspond to the frequencies of goup a, while in
new_q5_3 the first row (i.e. value 0) will be the frequency of group b
(answering "3").

This is highly confusing and you will probably not be able to tell
which value correspondsto which group.

Best
Daniel
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: st: Completely new version of -outreg-
Next by Date: Re: st: Is there a way to use Mata to speed up within-group extrema search in Stata?
Previous by thread: st: Completely new version of -outreg-
Next by thread: st: ivregress with gmm vs 2sls
Index(es):
- Date
- Thread