Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Quantum mechanics in Stata


From   Nick Cox <n.j.cox@durham.ac.uk>
To   "'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu>
Subject   RE: st: Quantum mechanics in Stata
Date   Tue, 15 Nov 2011 19:34:19 +0000

This particular example can be fudged by bumping up 

eps

to 

eps + epsilon(1)

More checking needed. 

Nick 
n.j.cox@durham.ac.uk 

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox
Sent: 15 November 2011 12:35
To: 'statalist@hsphsun2.harvard.edu'
Subject: RE: st: Quantum mechanics in Stata

Interesting and telling example. 

In this case, the exact proportions are multiples of 1/64 = 0.125^2 and some of the rounded percents are in error by the _maximum_ amount possible. The inequalities in the code are weak, so this is in principle allowed for, but in practice there could still be a precision problem. For example, 

. di %21x 34.38/100  - 0.005/100
+1.6000000000001X-002

. di %21x 34.375/100
+1.6000000000000X-002

So, this is a bug! I'll think about how to fix. Perhaps recast everything as an integer calculation. 

Nick 
n.j.cox@durham.ac.uk 


-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Seed, Paul
Sent: 15 November 2011 11:18
To: statalist@hsphsun2.harvard.edu
Subject: RE: st: Quantum mechanics in Stata

Dear Nick, 

A very nice idea when it works; but I am afraid I hit a snag 
with a quantum of 64.  Not easily solved, I suspect.

Thinking the problem lay with the rounding, I tried using various alternatives 
to 3.13; without getting the right answer.
And 5008 isn't even a multiple of 64.

 

*********************************
. tab age

        age |      Freq.     Percent        Cum.
------------+-----------------------------------
          6 |          2        3.13        3.13
          7 |          5        7.81       10.94
          8 |          4        6.25       17.19
          9 |          9       14.06       31.25
         10 |         18       28.13       59.38
         11 |         22       34.38       93.75
         12 |          2        3.13       96.88
         13 |          2        3.13      100.00
------------+-----------------------------------
      Total |         64      100.00

. mata : quantum(( 3.13 , 7.81, 6.25, 14.06, 28.13, 34.38, 3.13, 3.13  )/100, 0.005/100)
  5008


. di 5008/64
78.25

. mata : quantum(( 3.12 , 7.81, 6.25, 14.06, 28.12, 34.38, 3.12, 3.12  )/100, 0.005/100)
  5008

. mata : quantum(( 3.12 , 7.81, 6.25, 14.06, 28.12, 34.39, 3.12, 3.12  )/100, 0.005/100)
  3329



On Mon, Nov 14, 2011 at 3:05 PM, Nick Cox <n.j.cox@durham.ac.uk> wrote:
> The heading may have led some here who expect something different -- or removed others who deleted something clearly off their own familiar territory. However, the problem is classical and real, the mechanics of how to estimate how many quanta, meaning number in a sample, there are in some data when that number is suppressed.
>
> There is a question at the end.
>
> Sometimes you are presented with a table of proportions or percents, but the total sample size has been suppressed. Naturally, we all know this is poor practice, at best a lack of thought and at worst an attempt to deceive. The reason I just got interested in this will itself be suppressed to protect the guilty, but I suspect no more than the first.
>
> I recollected a discussion in Becker, R.A., Chambers, J.M., Wilks, A.R. 1988. The new S language. Pacific Grove, CA: Wadsworth & Brooks/Cole on pp.272ff. They report a survey from a magazine in which percents favouring five vendors were 14.6, 12.2, 12.2, 7.3, 7.3. Here the repeated ties and the occurrence of 14.6 as twice 7.3 are consistent with a small sample size. So, what is the smallest sample size consistent with such percents? The main idea is just that rounding to a certain resolution (here 0.1) means a maximum error of half that, so we look for the smallest size that would fit with such rounding. (If percents have themselves been calculated or copied incorrectly or otherwise massaged, then clearly the problem becomes much more difficult.)
>
> There is S code on p.273 which can be translated into Mata. Becker and friends write code for proportions.
>
> /// 1.0.0 NJC 14 November 2011
> real quantum(real vector y, real scalar eps) {
>        real scalar n
>        real vector work, i
>        work = select(y, !missing(y))
>        n = 1
>
>        while (1) {
>                i = round(n :* work)
>                if (all((((work :- eps) :* n) :<= i) :&
>                        (((work :+ eps) :* n) :>= i))) {
>                        return(n)
>                        break
>                }
>                n++
>        }
> }
>
> end
>
> Their example with this code (and their S code too) give 41 as an answer.
>
> . mata :
> : pc = (.146,.122,.073)
>
> : quantum(pc, 0.0005)
>  41
>
> : percent = (14.6,12.2,7.3)
>
> : quantum(percent/100, 0.0005)
>  41
>
> : quantum(percent/100, 0.05/100)
>  41
>
> Clearly we should test it out on other cases too.
>
> . sysuse auto
> (1978 Automobile Data)
>
> . tab rep78
>
>     Repair |
> Record 1978 |      Freq.     Percent        Cum.
> ------------+-----------------------------------
>          1 |          2        2.90        2.90
>          2 |          8       11.59       14.49
>          3 |         30       43.48       57.97
>          4 |         18       26.09       84.06
>          5 |         11       15.94      100.00
> ------------+-----------------------------------
>      Total |         69      100.00
>
> . mata : quantum((2.9, 11.59, 43.48, 26.09, 15.94)/100, 0.005/100)
>  69
>
> So, this may be amusing, or even useful. (As said, you get the smallest consistent sample size: all integer multiples of that are also consistent with a given percent breakdown.)
>
> It seems too simple not to be much more widely known than it appears to be. Concerns about data quality and research abuses cross disciplines, so there should be a corresponding scattered literature.
>
> I know of an earlier discussion in Wallis, W.A. and Roberts, H.V. 1956. Statistics: a new approach. Glencoe, IL: Free Press.
>
> I also know of the rather different paper
>
> Kendall, D.G. 1974. Hunting quanta. Philosophical Transactions of the Royal Society of London. Series A 276: 231-266.
>
> What am I missing? (Cameron?)
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index