Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Contract/Collapse Combination


From   Lucas <[email protected]>
To   [email protected]
Subject   Re: st: Contract/Collapse Combination
Date   Tue, 22 May 2012 09:51:06 -0700

Nick,

A composite 6-digit identifier is not a problem.  I indicated I did
not think it possible to make such an identifier for each cell of
15-way crosstab.  So, we are not disagreeing.

I don't think contract is buggy.  I think a simple (conceptually,
perhaps not computer "programmingly") extension of contract to allow
multiple (or at least 2) frequency counts seems a good idea if
possible, and consistent with the stata-proposed solution of
addressing slow estimation on big data with collapsing data and using
frequency counts.

I won't alert stata--they are listening anyway, and they can easily
come back at me and say I should get more memory.  And, of course, I'd
agree.  But, still, we'd be left with a command seemingly within
whispering distance of providing a general solution to a common task,
but not going that final distance.

Thanks, though.
Sam

On Tue, May 22, 2012 at 9:37 AM, Nick Cox <[email protected]> wrote:
> The solution here of producing a composite identifier looks likely to fail. You are putting a very big number into a -float- variable and expect to retain every last bit of precision. See
>
> http://blog.stata.com/2012/04/02/the-penultimate-guide-to-precision/
>
> for why that is a bad idea.
>
> As for the rest, you seem to be claiming that -contract- is buggy. That is important if true, and you should send in a report containing incontrovertible evidence to Stata tech-support.
>
> Nick
> [email protected]
>
> Lucas
>
> Brendan,
>
> My original note indicated exactly the solution you propose, of doing
> it twice and merging.  But this is incredibly risky, because there is
> no way to assure every combination appears in both files.  Even the
> "zero" option apparently cannot assure this.  Believe me, I tried this
> with about 6 variables, and the file sizes do not equate across
> runs--not to mention that one has to be pretty certain everything is
> sorted exactly right.  I do not know *why* the problem occurred, it
> occurred, and perhaps it is that the file is so big, that problems
> emerge that do not exist for smaller datasets (e.g., sorted cases fall
> out of sorts, as it were).
>
> At any rate, my response was to make an id based on the 6 variables:
>
> gen id=(x1*10000)+(x2*1000)+. . .+(x6) ;
>
> This works for 6 dichotomous variables; it will not work for 15
> variables of various types, because the id# will exceed the largest
> value allowed in stata.
>
> THUS, it seems a more general solution is needed, that does not
> require a later merge.
>
> As for your collapse example, it is unclear, as you start with data
> that is already collapsed.  The problem is the data is not collapsed,
> and the aim is to get it into the collapsed form.
>
> On Tue, May 22, 2012 at 7:50 AM, Brendan Halpin <[email protected]> wrote:
>> On Tue, May 22 2012, Lucas wrote:
>>
>>> Is there a way to use the contract command and obtain frequencies for
>>> TWO variables rather than just ONE?  A corollary question would be, Is
>>> there a way to use the contract command and obtain the count of 1's on
>>> TWO separate dichotomous variables?
>>
>> That is what my example achieves, though using -collapse- instead of
>> -contract-.
>>
>> Another way of doing it would be to separate the data by entercol, and
>> -contract- or -collapse- it twice, once for entercol==1 and once for
>> entercol==0, and then merge the resulting files by the 15 crosstab
>> variables.
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index