Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Contract/Collapse Combination

From   Lucas <>
Subject   Re: st: Contract/Collapse Combination
Date   Tue, 22 May 2012 09:51:06 -0700


A composite 6-digit identifier is not a problem.  I indicated I did
not think it possible to make such an identifier for each cell of
15-way crosstab.  So, we are not disagreeing.

I don't think contract is buggy.  I think a simple (conceptually,
perhaps not computer "programmingly") extension of contract to allow
multiple (or at least 2) frequency counts seems a good idea if
possible, and consistent with the stata-proposed solution of
addressing slow estimation on big data with collapsing data and using
frequency counts.

I won't alert stata--they are listening anyway, and they can easily
come back at me and say I should get more memory.  And, of course, I'd
agree.  But, still, we'd be left with a command seemingly within
whispering distance of providing a general solution to a common task,
but not going that final distance.

Thanks, though.

On Tue, May 22, 2012 at 9:37 AM, Nick Cox <> wrote:
> The solution here of producing a composite identifier looks likely to fail. You are putting a very big number into a -float- variable and expect to retain every last bit of precision. See
> for why that is a bad idea.
> As for the rest, you seem to be claiming that -contract- is buggy. That is important if true, and you should send in a report containing incontrovertible evidence to Stata tech-support.
> Nick
> Lucas
> Brendan,
> My original note indicated exactly the solution you propose, of doing
> it twice and merging.  But this is incredibly risky, because there is
> no way to assure every combination appears in both files.  Even the
> "zero" option apparently cannot assure this.  Believe me, I tried this
> with about 6 variables, and the file sizes do not equate across
> runs--not to mention that one has to be pretty certain everything is
> sorted exactly right.  I do not know *why* the problem occurred, it
> occurred, and perhaps it is that the file is so big, that problems
> emerge that do not exist for smaller datasets (e.g., sorted cases fall
> out of sorts, as it were).
> At any rate, my response was to make an id based on the 6 variables:
> gen id=(x1*10000)+(x2*1000)+. . .+(x6) ;
> This works for 6 dichotomous variables; it will not work for 15
> variables of various types, because the id# will exceed the largest
> value allowed in stata.
> THUS, it seems a more general solution is needed, that does not
> require a later merge.
> As for your collapse example, it is unclear, as you start with data
> that is already collapsed.  The problem is the data is not collapsed,
> and the aim is to get it into the collapsed form.
> On Tue, May 22, 2012 at 7:50 AM, Brendan Halpin <> wrote:
>> On Tue, May 22 2012, Lucas wrote:
>>> Is there a way to use the contract command and obtain frequencies for
>>> TWO variables rather than just ONE?  A corollary question would be, Is
>>> there a way to use the contract command and obtain the count of 1's on
>>> TWO separate dichotomous variables?
>> That is what my example achieves, though using -collapse- instead of
>> -contract-.
>> Another way of doing it would be to separate the data by entercol, and
>> -contract- or -collapse- it twice, once for entercol==1 and once for
>> entercol==0, and then merge the resulting files by the 15 crosstab
>> variables.
> *
> *   For searches and help try:
> *
> *
> *

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index