Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nirina F <fstata@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: -collapsetofile- |

Date |
Fri, 28 Feb 2014 17:07:05 -0500 |

Great! Was looking for something like this one. On Fri, Feb 28, 2014 at 2:00 PM, Nick Cox <njcoxstata@gmail.com> wrote: > To get stuff on SSC, you just need to email Kit Baum with the files. > But as he announced very recently he is away from base right now. > > http://repec.org/bocode/s/sscsubmit.html gives full details. > > Nick > njcoxstata@gmail.com > > > On 28 February 2014 18:51, Andrew Maurer <Andrew.Maurer@qrm.com> wrote: >> Thanks for the reference, David. >> >> Looking at xcollapse.do, it internally does a preserve/save/restore. The whole idea of -collapsetofile- is to save the data without doing a preserve/restore. It looks like the intended purpose of -xcollapse- is to add features to collapse, while the purpose of -collapsetofile- is to save a file faster. (-collapsetofile-, at the moment, does far less then collapse - I still need to spend some time reading through the syntax-parsing portion of collapse to allow syntax like (sum) x1 = y x2 = z...) >> >> It looks like I need a RePEc account to post to SSC, if I'm understanding this. I'm looking into it now. >> >> Andrew Maurer >> >> -----Original Message----- >> From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox >> Sent: Friday, February 28, 2014 12:47 PM >> To: statalist@hsphsun2.harvard.edu >> Subject: Re: st: -collapsetofile- >> >> Using SSC as a medium for distributing user-written programs is naturally entirely optional for user-programmers. >> >> It is however I think germane that SSC requires provision of help files as part of a minimum standard for inclusion of packages. >> >> Similarly, providing help files would help people to understand exactly what these programs do and help Andrew get good feedback from anyone interested. >> >> (If I am missing the help files, please do flag where they are.) >> >> Nick >> njcoxstata@gmail.com >> >> >> On 28 February 2014 18:24, Jorge Eduardo Pérez Pérez <jorge_perez@brown.edu> wrote: >>> Thanks Andrew, this looks useful. >>> >>> Why not submit the code to SSC to make it easier for users to install >>> this directly from Stata? >>> >>> >>> -------------------------------------------- >>> Jorge Eduardo Pérez Pérez >>> Graduate Student >>> Department of Economics >>> Brown University >>> >>> >>> On Fri, Feb 28, 2014 at 1:19 PM, Nick Cox <njcoxstata@gmail.com> wrote: >>>> -save- is part of the executable >>>> >>>> . which save >>>> built-in command: save >>>> >>>> and so its code is not accessible to users. >>>> >>>> Nick >>>> njcoxstata@gmail.com >>>> >>>> >>>> On 28 February 2014 18:06, Andrew Maurer <Andrew.Maurer@qrm.com> wrote: >>>>> Hi Statalist, >>>>> >>>>> I've written a pair of program -collapsetofile- and -recover- to allow users to "collapse" data to a file without destroying the dataset like -collapse- does. I don't know if anyone else will have use for this, but it will save me a lot of computer time when dealing with large datasets. I would be very interested if anyone has any input or comments on how to improve coding efficiency / style (the code is still a bit rough). >>>>> >>>>> ado file (collapsetofile.ado): http://codepad.org/DcwtvDEb ado file >>>>> (recover.ado) : http://codepad.org/csZhQvb0 sthlp file >>>>> (collapsetofile.sthlp): http://codepad.org/AsKC79uK >>>>> >>>>> The biggest improvement would come from being able to save directly to a .dta. I assume that this would require either: >>>>> 1) looking at the format/header/footer of stata dtas in clear text >>>>> and fwrite()'ing it from mata, and/or >>>>> 2) looking at the source for a command like save and just copying >>>>> that (is the source for -save- available?) >>>>> >>>>> Before writing this I found myself waiting for hours when graphing summary statistics of large datasets with sequences of: >>>>> >>>>> use fulldata // this could be >10gb >>>>> preserve >>>>> collapse (sum) thisvar thatvar, by(byvar1 byvar2) ... some data >>>>> manipulation twoway line... >>>>> restore >>>>> >>>>> preserve >>>>> collapse (sum) anothervar yetanothervar, by(byvar3) ... some data >>>>> manipulation twoway line... >>>>> restore >>>>> >>>>> ... >>>>> >>>>> preserve >>>>> collapse (sum) more vars, by(byvar10) ... some data manipulation >>>>> twoway line... >>>>> restore >>>>> >>>>> For a 20gb dataset with 10 graphs, that makes 10 preserves/restores * 20gb = 200gb written/read to disk. -collapsetofile- writes just the collapsed data to be graphed to a file with no other disk reads/writes: >>>>> >>>>> use fulldata >>>>> collapsetofile (sum) thisvar thatvar using dataforgraph1, by(byvar1 >>>>> byvar2) collapsetofile (sum) anothervar yetanothervar dataforgraph2, >>>>> by(byvar3) ... >>>>> collapsetofile (sum) more vars, by(byvar10) >>>>> >>>>> recover dataforgraph1, clear >>>>> ... some data manipulation >>>>> twoway line... >>>>> ... >>>>> recover dataforgraph2, clear >>>>> ... some data manipulation >>>>> twoway line... >>>>> ... >>>>> >>>>> Thanks to Nick Cox for mentioning the importance of saving characteristics/metadata with the dataset. >>>>> Thanks to Sergiy Radyakin for making me realize that I could never write a mata program that would compute stats "by" variables as fast as stata's -_mean- in -collapse-, since stata's built-in C code can take advantage of parallelization, while mata code cannot. > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**References**:**st: -collapsetofile-***From:*Andrew Maurer <Andrew.Maurer@qrm.com>

**Re: st: -collapsetofile-***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: -collapsetofile-***From:*Jorge Eduardo Pérez Pérez <jorge_perez@brown.edu>

**Re: st: -collapsetofile-***From:*Nick Cox <njcoxstata@gmail.com>

**RE: st: -collapsetofile-***From:*Andrew Maurer <Andrew.Maurer@qrm.com>

**Re: st: -collapsetofile-***From:*Nick Cox <njcoxstata@gmail.com>

- Prev by Date:
**st: Run time for 3-level model in xtmixed** - Next by Date:
**st: trying to understand collapse and weights** - Previous by thread:
**Re: st: -collapsetofile-** - Next by thread:
**Re: st: -collapsetofile-** - Index(es):