Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Missing cases after contract with large weight


From   "Friedrich Huebler" <fhuebler@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   st: Missing cases after contract with large weight
Date   Thu, 20 Dec 2007 16:32:05 -0500

I discovered that -contract- drops observations when a weight with
large numbers is used. What is the explanation for this and is
-contract- designed to work this way?

Below are two examples for -contract- with weights. In the first
example, the weight is not too large and the -contract-ed dataset
contains the same frequencies as those reported by -tabulate-.

. sysuse auto, clear
. replace weight = weight * 10000
. tab rep78 [fw=weight], m

     Repair |
Record 1978 |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 | 62,000,000        2.77        2.77
          2 |268,300,000       12.01       14.78
          3 |989,700,000       44.29       59.08
          4 |516,600,000       23.12       82.20
          5 |255,500,000       11.43       93.63
          . |142,300,000        6.37      100.00
------------+-----------------------------------
      Total | 2234400000      100.00

. contract rep78 [fw=weight]
. clist

        rep78         _freq
  1.        1      62000000
  2.        2     268300000
  3.        3     989700000
  4.        4     516600000
  5.        5     255500000
  6.        .     142300000

In the second example, the weight variable is not multiplied by 10,000
but 100,000. When -contract- is used with the larger weight, some
values are dropped from the dataset.

. sysuse auto, clear
. replace weight = weight * 100000
. tab rep78 [fw=weight], m

     Repair |
Record 1978 |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |620,000,000        2.77        2.77
          2 | 2683000000       12.01       14.78
          3 | 9897000000       44.29       59.08
          4 | 5166000000       23.12       82.20
          5 | 2555000000       11.43       93.63
          . | 1423000000        6.37      100.00
------------+-----------------------------------
      Total |22344000000      100.00

. contract rep78 [fw=weight]
. clist

        rep78         _freq
  1.        1     620000000
  2.        2             .
  3.        3             .
  4.        4             .
  5.        5             .
  6.        .    1423000000

Perhaps it is a coincidence, but the frequencies that appear in the
-tabulate- output and are missing from the -contract- output exceed
the largest possible number that can be held in a variable of datatype
long: 2,147,483,620. Can -contract- be modified to allow larger
weights?

Thanks,

Friedrich
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2022 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index