Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: upper limit on fweights? overflowing into missing values?


From   Richard Williams <[email protected]>
To   [email protected], [email protected]
Subject   Re: st: upper limit on fweights? overflowing into missing values?
Date   Thu, 01 Aug 2013 15:26:52 -0500

I still don't understand what the fweights are supposed to represent, i.e. what is an observation in these data? If it is the dollar value of the portfolio, you could simply measure the value in millions of dollars rather than dollars. Or, if it is the number of shares of stock, it could be measured in 1000s of shares rather than shares. If you can be clear on what an observation is that might help.

Like Nick, I could also see where aweights might be right. According to the docs, "Analytic aweights are typically appropriate when you are dealing with data containing averages. For instance, you have average income and average characteristics on a group of people. The weighting variable contains the number of persons over which the average was calculated (or a number proportional to that amount)." aweights might be used for things like states of the United States. Or maybe even countries. The world population is in the 7 billion range, so if you had one record per country with things like, say, average income, the aweights could be the population size. Stata should be able to handle that fine even thought it couldn't handle 7 billion records for every person in the world. If, say, you have 100 portfolios with a total of 100 billion shares of stock, and for each portfolio you have the average value of a share of stock, along with other characteristics of the portfolio (e.g. how managed) aweights would sound right to me.

I agree that the documentation should be better and I am glad that Stata says it is going to work on it. But, this seems like a wildly esoteric problem to me. How many people have 4 billion cases? I don't think many do. I can see how this one has slipped through the cracks for the 25+ years Stata has been around.

And in this case, I am not sure that you have 4 billion cases either. Again, if you can clarify what an observation is, that may help. If it is something like dollar value, that doesn't really strike me as being cases, but even if it is it seems easy enough to rescale into millions or thousands or whatever in order to make the problem manageable.


At 01:35 PM 8/1/2013, László Sándor wrote:
Thanks, Nick.

Then maybe I have a terrible understanding of what aweights are. My
larger portfolios are not simply more precisely priced, they are,
well, larger. I think that enters a pwcorr calculation differently,
though maybe not.

On semantics: I think an observation is anchored in the actual data in
Stata. But whether the weighting is sensible should not depend on
whether my dollar-by-dollar comparison uses larger numbers than an
investor-by-investor comparison. And I definitely disagree with the
notion that the current (undocumented) limits are fine because no one
would have this many "observations." Yes, no one would have this many
lines in Stata, but fweights are exactly there to talk about larger
populations than the aggregates in the data, and the dollar values can
easily get this large, even without "genetics." I would push back on
monetary amounts not being populations/observations so it is fine that
Stata silently overflows if it encounters them.

So let's root for more documentation soon.

On Tue, Jul 30, 2013 at 8:54 AM, Nick Cox <[email protected]> wrote:
> On the contrary, it seems to me that "what is an observation?" is more
> than semantic here: it is the nub of the issue!
>
> It's your problem but this sounds to me like a case for analytic
> weights. The use of frequency weights is also suspect unless the
> weights are integers (without artifice or rounding).
>
> As I've said or implied in earlier posts, this all should be a bit
> better documented.
> Nick
> [email protected]
>
>
> On 30 July 2013 13:34, László Sándor <[email protected]> wrote:
>> Thanks, Richard.
>>
>> Stata tech support got back to me and suggested something similar:
>> that some operations with fweights do overflow with such large
>> weights, others don't. I am not sure whether we shall call it
>> hard-coded as a restriction on some number somewhere or simply the C
>> implementation of -mf_quadcross- or something.
>>
>> I think I tried to describe my use case: I wanted to calculate stats
>> on portfolios, and it makes sense to weight by the size of them. As
>> pwcorr does not allow iweights, and pweights and aweights do something
>> completely different, I thought I'd use fweights. It blows up unless I
>> rescale the portfolios into thousands, millions or billions.
>>
>> Not a big deal, but Stata's (non-existent) error message, help and
>> documentation were not exactly helpful in resolving this. StataCorp
>> says they will address this.
>>
>> I think what an observation is is a semantic issue here, not very
>> helpful. Is an entire portfolio "one observation" or a single share in
>> each, or each dollar behind each? I am not sure this should matter
>> neither for us nor Stata.
>>
>> Best,
>>
>> Laszlo
>>
>> On Mon, Jul 29, 2013 at 9:53 AM, Richard Williams
>> <[email protected]> wrote:
>>> Just to sum up my current thinking/guesses on this:
>>>
>>> * the maximum number of observations in Stata is 2,147,483,647
>>> * Nonetheless, fweighted data sets can have more observations than that
>>> * However, not all routines will work when the fweighted data has more than
>>> 2,147,483,647 cases. You can do some simple descriptive things, but you
>>> can't do more complicated things like regression or correlations.
>>> * As to why that is, I am guessing that some routines have the 2,147,483,647
>>> limit hardcoded in. Or, maybe there just isn't enough precision to handle
>>> calculations when the N is larger than that.
>>> * Given that most people don't have more than 2,147,483,647 cases (and even >>> if they did, their computer memory couldn't handle them) StataCorp probably
>>> hasn't spent a lot of time worrying about this.
>>> * Still, an added sentence or two in the fweights documentation or elsewhere
>>> warning about limits might be a good idea.
>>>
>>> I am curious what the original author is doing that requires analyzing 4
>>> billion+ cases. Some sort of genetic research maybe? I've certainly never
>>> heard of any kind of Survey research having an N that large.
>>>
>>>
>>>
>>> At 06:53 PM 7/28/2013, Nick Cox wrote:
>>>>
>>>> This is interesting, but in principle I don't see that Stata's limit
>>>> on # of observations has any bearing on how big frequency weights can
>>>> be. I can imagine people wanting to use frequency weights to subvert
>>>> the limit on number of observations.
>>>>
>>>> A different point is that if there is a limit on how big weights can
>>>> be it should be documented e.g. at -help limits-.
>>>> Nick
>>>> [email protected]
>>>>
>>>>
>>>> On 29 July 2013 00:46, Richard Williams <[email protected]>
>>>> wrote:
>>>> > According to -help limits-, the maximum number of observations is
>>>> > 2,147,483,647. Your weights give you more than 4 billion cases, well above >>>> > that. Further, the help also says that this is a theoretical maximum; memory
>>>> > availability will certainly impose a smaller maximum.
>>>> >
>>>> > On my computer, I specified [fw = 1073741823] on the pwcorr command and >>>> > it ran. Then I specified [fw = 1073741824] and it did not run. These numbers >>>> > put you just below and just above the maximum number of cases that Stata
>>>> > allows.
>>>> >
>>>> > So in short, it appears that your fweighted cases can't exceed the 2
>>>> > billion+ that Stata allows, and memory restrictions may hold you to even
>>>> > less than that.
>>>> >
>>>> > Also, you probably need to specify that the fweight variable is type
>>>> > long, e.g.
>>>> >
>>>> > input y x long fw
>>>> >
>>>> > Sent from my iPad
>>>> >
>>>> > On Jul 27, 2013, at 12:36 PM, László Sándor <[email protected]> wrote:
>>>> >
>>>> >> Hi,
>>>> >> If you care, here is an example that silently produces missing values.
>>>> >> I notified Stata Support.
>>>> >>
>>>> >> input y x fw
>>>> >> 2 1 2147483621
>>>> >> 1 2 2147483621
>>>> >> end
>>>> >> de
>>>> >> pwcorr y x [fw=fw]
>>>> >> exit
>>>> >>
>>>> >> Thanks,
>>>> >>
>>>> >> Laszlo
>>>> >>
>>>> >> On Sun, Jul 21, 2013 at 5:08 PM, Nick Cox <[email protected]> wrote: >>>> >>> I'd suggest documenting your problems with a reproducible example and
>>>> >>> sending Stata tech support.
>>>> >>>
>>>> >>>
>>>> >>> Nick
>>>> >>> [email protected]
>>>> >>>
>>>> >>>
>>>> >>> On 21 July 2013 21:55, László Sándor <[email protected]> wrote:
>>>> >>>> Hi,
>>>> >>>> in Stata/MP 12.1 I am getting missing values with using -pwcorr- with
>>>> >>>> -fweights- though the feature works fine with other data or if I
>>>> >>>> scale
>>>> >>>> my weights down. Is it possible to simply have too large fweights,
>>>> >>>> e.g. if they cannot be of type -long- anymore?
>>>> >>>>
>>>> >>>> If so, why doesn't Stata warn me about this?
>>>> >>>>
>>>> >>>> I vaguely remember some Statalist of Stata blog discussion of this,
>>>> >>>> but I could not even Google it up, and Stata still did not warn me?
>>>> >>>>
>>>> >>>> Actually, why didn't Stata complain that I did not have integer
>>>> >>>> fweights if obviously the variable wasn't of type byte, int or long?
>>>> >>>>
>>>> >>>> Thanks,
>>>> >>>>
>>>> >>>> Laszlo
>>>> >>>>
>>>> >>>> *
>>>> >>>> *   For searches and help try:
>>>> >>>> *   http://www.stata.com/help.cgi?search
>>>> >>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> >>>> *   http://www.ats.ucla.edu/stat/stata/
>>>> >>>
>>>> >>> *
>>>> >>> *   For searches and help try:
>>>> >>> *   http://www.stata.com/help.cgi?search
>>>> >>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> >>> *   http://www.ats.ucla.edu/stat/stata/
>>>> >>
>>>> >> *
>>>> >> *   For searches and help try:
>>>> >> *   http://www.stata.com/help.cgi?search
>>>> >> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> >> *   http://www.ats.ucla.edu/stat/stata/
>>>> >
>>>> > *
>>>> > *   For searches and help try:
>>>> > *   http://www.stata.com/help.cgi?search
>>>> > *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> > *   http://www.ats.ucla.edu/stat/stata/
>>>>
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>>
>>> -------------------------------------------
>>> Richard Williams, Notre Dame Dept of Sociology
>>> OFFICE: (574)631-6668, (574)631-6463
>>> HOME:   (574)289-5227
>>> EMAIL:  [email protected]
>>> WWW:    http://www.nd.edu/~rwilliam
>>>
>>>
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
HOME:   (574)289-5227
EMAIL:  [email protected]
WWW:    http://www.nd.edu/~rwilliam


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index