Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[no subject]

1. Publication of user-written commands requires publication of help
files to be taken seriously.

2. Existing names, meaning those attached to existing commands in
Stata or made public through the Stata Journal or SSC or accessible
websites, are to be considered the property of StataCorp or the
program authors. So, you should use new names. Not doing so runs the
risk of confusing many, to say the least. (As above, no help files are
available to document your acknowledgments readably. Documenting what
a program does within the code is natural to programmers, but
manifestly the typical Stata user doesn't expect to have to read the
code.)

3. It would have been courteous to inform existing program authors
privately before publicly advertising "improved versions" of their
programs. In my case you should feel free to publish improved versions
of my programs under different names and with help files.

Nick

On Fri, Dec 7, 2012 at 4:34 PM, James Sams <[email protected]> wrote:
> I keep changing some user-written commands to suit my purposes or fix things
> that have broken over the years and thought I'd contribute these back.
> However, some peer review may be a good idea before tracking down the
> individual authors and trying to get the changes committed.
>
> Here is a summary of what I have right now:
>
>   * collapse_preserve_label.do: preserve variable and value labels of
>     same-named variables when using collapse. I believe StataCorp has an FAQ
>     that outlines this program.
>
>   * gzfile.ado: provide ability to interact with gzipped dta files using
>     modern syntax of Stata's various file commands (save, use, append, merge).
>     Derived from gzsave.
>
>   * indexesof.ado: a variant of levelsof to skirt around macro length issues
>     and provide the index within the dataset of each unique value.
>
>   * insheet2.ado: a more reliable insheet, uses replace_dquotes.py.
>
>   * labmask.ado: an update to the original labmask to be faster.
>     Depends on indexesof.
>
>   * replace_dquotes.py: Replaces double quotes in csv files to another
>     character, e.g pipe ('|'), so that Stata's insheet does not corrupt the
>     input.  Assumes there are no |'s in the original data. Replace all |'s in
>     all string variables back to double quotes to restore original data. The
>     character used is printed to stdout.
>
>   * unique.ado: edited unique command from ssc to accept a compound if stmt.
>
> You can check out the files and future updates/additions at my bitbucket
> repository: https://bitbucket.org/james.sams/statafiles/
>
> There are no help files, but the commands are well documented within each
> source file.
>
>
> A couple examples of what I've changed:
>
> An example of a performance improvement is labmask.ado, which is derived from
> Nick Cox's labmask. On somewhat larger datasets (a couple of a million
> observations with thousands unique value/label pairs), this version runs in a
> few seconds rather than multiple hours. It also does not require the creation
> of any new variables, just a couple of mata vectors; so, it does not increase
> memory usage much at all.
>
> insheet breaks for me, and others I provide support for, constantly. Between
> truncating data, misinterpreting column breaks, and not using double by
> default, I think insheet should be used more conservatively than most may
> expect given the apparent simplicity of the command, especially since a lot of
> these errors are silent and are not easy to catch.
>
> I wrote insheet2/replace_dquotes.py to try to be a catch-all place to put all
> the necessary guards for insheet, to be used without second thought. I'm not
> 100% sure that I've caught everything, but it has worked for me on all the
> datasets that have failed with insheet, with the exception of one observation
> files that do not have a header, which Stata still interprets as having 0
> observations without the 'nonames' argument.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: Re: st: Improved commands, sample implementations. Any interest?
Next by Date: Re: st: generate Spell Counter or Duration Variable
Previous by thread: st: generate Spell Counter or Duration Variable
Next by thread: st: ivreg or ivpois with mi estimate
Index(es):
- Date
- Thread