Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Quintiles


From   Nick Cox <[email protected]>
To   [email protected]
Subject   Re: st: Quintiles
Date   Thu, 9 Aug 2012 11:38:35 +0100

If I read this correctly, Leonardo agrees that exactly equal
frequencies may be impossible with -xtile- but wants to appear to do
it exactly by subterfuge, using weights.

This can be done:

. sysuse auto
. xtile qmpg = mpg, n(5)
. tab qmpg

5 quantiles |
     of mpg |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |         18       24.32       24.32
          2 |         17       22.97       47.30
          3 |         13       17.57       64.86
          4 |         12       16.22       81.08
          5 |         14       18.92      100.00
------------+-----------------------------------
      Total |         74      100.00

. bysort qmpg : gen w = 1/_N

. tabstat w , by(qmpg)  s(n sum)

Summary for variables: w
     by categories of: qmpg (5 quantiles of mpg)

    qmpg |         N       sum
---------+--------------------
       1 |        18         1
       2 |        17         1
       3 |        13         1
       4 |        12         1
       5 |        14         1
---------+--------------------
   Total |        74         5
------------------------------

However, why is exact equality such a big deal here? Why coarsen when
you have quantitative information to hand?

See also the thread gathered in
http://www.stata.com/statalist/archive/2012-06/msg01193.html on how
-xtile- on a negated version of a variable may (or may not) work
better.

Nick

On Thu, Aug 9, 2012 at 9:16 AM, Maarten Buis <[email protected]> wrote:
> On Wed, Aug 8, 2012 at 9:44 PM, Leonardo Jaime Gonzalez Allende wrote:
>> I don't was planning to cut a person or household in many parts. The question was about a possible adjustment to the weight factor, if the observation of the sample is the cut point of the quintile.
>>
>> If I sort the households of a sample by their incomes, a household "x" could represents 300 households but the accumulated frequency of the population is e.g. 20,02%.
>>
>> My question was if there is an efficient way (command) to repeat the observation and adjust weight factor as follow:
>>
>> the same household "xa" now represents 280 households and now the accumulated frequency of the population is e.g. 20% (exactly) (leaving to the first quintile).
>
> What kind of weight did you have in mind, aweigths, pweights,
> iweights, fweights? Weighting can be a remarkably tricky issue. There
> are many ways such a procedure could go wrong, and I don't know if
> there is way to get it right. Anyhow, I cannot imagine a situation
> where such an effort would be worth the cost (but that may just as
> well say something about a lack of imagination on my part). I would
> just live with the fact that the discrete nature of the number of
> observations leads to slight variations in group size.
>
> Did you look at the possibility that ties (different people reporting
> exactly the same income) are the source of differences in group size?
> In theory, such ties should be pretty rare for a (semi-)continuous
> variable like income. However, in practice respondents tend to round
> their answers, making such ties a lot more common.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index