Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Lucas <lucaselastic@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Contract/Collapse Combination |
Date | Tue, 22 May 2012 09:51:06 -0700 |
Nick, A composite 6-digit identifier is not a problem. I indicated I did not think it possible to make such an identifier for each cell of 15-way crosstab. So, we are not disagreeing. I don't think contract is buggy. I think a simple (conceptually, perhaps not computer "programmingly") extension of contract to allow multiple (or at least 2) frequency counts seems a good idea if possible, and consistent with the stata-proposed solution of addressing slow estimation on big data with collapsing data and using frequency counts. I won't alert stata--they are listening anyway, and they can easily come back at me and say I should get more memory. And, of course, I'd agree. But, still, we'd be left with a command seemingly within whispering distance of providing a general solution to a common task, but not going that final distance. Thanks, though. Sam On Tue, May 22, 2012 at 9:37 AM, Nick Cox <n.j.cox@durham.ac.uk> wrote: > The solution here of producing a composite identifier looks likely to fail. You are putting a very big number into a -float- variable and expect to retain every last bit of precision. See > > http://blog.stata.com/2012/04/02/the-penultimate-guide-to-precision/ > > for why that is a bad idea. > > As for the rest, you seem to be claiming that -contract- is buggy. That is important if true, and you should send in a report containing incontrovertible evidence to Stata tech-support. > > Nick > n.j.cox@durham.ac.uk > > Lucas > > Brendan, > > My original note indicated exactly the solution you propose, of doing > it twice and merging. But this is incredibly risky, because there is > no way to assure every combination appears in both files. Even the > "zero" option apparently cannot assure this. Believe me, I tried this > with about 6 variables, and the file sizes do not equate across > runs--not to mention that one has to be pretty certain everything is > sorted exactly right. I do not know *why* the problem occurred, it > occurred, and perhaps it is that the file is so big, that problems > emerge that do not exist for smaller datasets (e.g., sorted cases fall > out of sorts, as it were). > > At any rate, my response was to make an id based on the 6 variables: > > gen id=(x1*10000)+(x2*1000)+. . .+(x6) ; > > This works for 6 dichotomous variables; it will not work for 15 > variables of various types, because the id# will exceed the largest > value allowed in stata. > > THUS, it seems a more general solution is needed, that does not > require a later merge. > > As for your collapse example, it is unclear, as you start with data > that is already collapsed. The problem is the data is not collapsed, > and the aim is to get it into the collapsed form. > > On Tue, May 22, 2012 at 7:50 AM, Brendan Halpin <brendan.halpin@ul.ie> wrote: >> On Tue, May 22 2012, Lucas wrote: >> >>> Is there a way to use the contract command and obtain frequencies for >>> TWO variables rather than just ONE? A corollary question would be, Is >>> there a way to use the contract command and obtain the count of 1's on >>> TWO separate dichotomous variables? >> >> That is what my example achieves, though using -collapse- instead of >> -contract-. >> >> Another way of doing it would be to separate the data by entercol, and >> -contract- or -collapse- it twice, once for entercol==1 and once for >> entercol==0, and then merge the resulting files by the 15 crosstab >> variables. > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/