Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Improved commands, sample implementations. Any interest?


From   Nick Cox <[email protected]>
To   [email protected]
Subject   Re: st: Improved commands, sample implementations. Any interest?
Date   Fri, 7 Dec 2012 19:11:38 +0000

The key here is the word _additional_. If a program already is
r-class, other r-class results can be useful.

That said, if you do this, my advice is put the programs in what
-adopath- calls PERSONAL so that your next version of Stata can see
them too. Naturally, Jeph is almost certainly doing this already, but
it's worth emphasis.

Nick

On Fri, Dec 7, 2012 at 7:04 PM, Jeph Herrin <[email protected]> wrote:
> On this topic (or close to it), I routinely modify SSC and StataCorp .ado
> files, but exclusively to include additional -return- statements. Eg, to the
> very handy -levelsof.ado- I added
>
>   return local nvals `"`nvals'"'
>
> so I could capture the number of items in the returned macro without
> counting it up myself.
>
> On the one hand, it seems like a very "passive" modification to return a
> macro that the author has been kind enough to work out the contents of and
> -display- for me to see. On the other hand, I am still reluctant enough
> about fiddling with others' code that I nontheless change the name (eg, to
> -mylevelsof.ado-).
>
> All of which is to say, I would encourage StataCorp and especially SSC
> authors to be liberal with -return-ing calculated values.
>
> cheers,
> Jeph
>
>
>
> On 12/7/2012 12:16 PM, Nick Cox wrote:
>>
>> Some etiquette for working with user-written programs has long since
>> been suggested at
>>
>> http://www.stata.com/support/faqs/resources/statalist-faq/#relation
>>
>> I quote the most relevant part
>>
>> "In practice, you can probably take anything published in either
>> medium and modify it as you will—especially if you do that
>> privately—but publicly we recommend that, unless you are the original
>> author, you change the name of the program, take all blame for any
>> limitations your changes produce, and imply that a suitably large
>> portion of the credit for the program belongs to the original
>> authors."
>>
>>> From that and other considerations my own suggestion is that
>>
>>
>> 1. Publication of user-written commands requires publication of help
>> files to be taken seriously.
>>
>> 2. Existing names, meaning those attached to existing commands in
>> Stata or made public through the Stata Journal or SSC or accessible
>> websites, are to be considered the property of StataCorp or the
>> program authors. So, you should use new names. Not doing so runs the
>> risk of confusing many, to say the least. (As above, no help files are
>> available to document your acknowledgments readably. Documenting what
>> a program does within the code is natural to programmers, but
>> manifestly the typical Stata user doesn't expect to have to read the
>> code.)
>>
>> 3. It would have been courteous to inform existing program authors
>> privately before publicly advertising "improved versions" of their
>> programs. In my case you should feel free to publish improved versions
>> of my programs under different names and with help files.
>>
>> Nick
>>
>> On Fri, Dec 7, 2012 at 4:34 PM, James Sams <[email protected]> wrote:
>>>
>>> I keep changing some user-written commands to suit my purposes or fix
>>> things
>>> that have broken over the years and thought I'd contribute these back.
>>> However, some peer review may be a good idea before tracking down the
>>> individual authors and trying to get the changes committed.
>>>
>>> Here is a summary of what I have right now:
>>>
>>>    * collapse_preserve_label.do: preserve variable and value labels of
>>>      same-named variables when using collapse. I believe StataCorp has an
>>> FAQ
>>>      that outlines this program.
>>>
>>>    * gzfile.ado: provide ability to interact with gzipped dta files using
>>>      modern syntax of Stata's various file commands (save, use, append,
>>> merge).
>>>      Derived from gzsave.
>>>
>>>    * indexesof.ado: a variant of levelsof to skirt around macro length
>>> issues
>>>      and provide the index within the dataset of each unique value.
>>>
>>>    * insheet2.ado: a more reliable insheet, uses replace_dquotes.py.
>>>
>>>    * labmask.ado: an update to the original labmask to be faster.
>>>      Depends on indexesof.
>>>
>>>    * replace_dquotes.py: Replaces double quotes in csv files to another
>>>      character, e.g pipe ('|'), so that Stata's insheet does not corrupt
>>> the
>>>      input.  Assumes there are no |'s in the original data. Replace all
>>> |'s in
>>>      all string variables back to double quotes to restore original data.
>>> The
>>>      character used is printed to stdout.
>>>
>>>    * unique.ado: edited unique command from ssc to accept a compound if
>>> stmt.
>>>
>>> You can check out the files and future updates/additions at my bitbucket
>>> repository: https://bitbucket.org/james.sams/statafiles/
>>>
>>> There are no help files, but the commands are well documented within each
>>> source file.
>>>
>>>
>>> A couple examples of what I've changed:
>>>
>>> An example of a performance improvement is labmask.ado, which is derived
>>> from
>>> Nick Cox's labmask. On somewhat larger datasets (a couple of a million
>>> observations with thousands unique value/label pairs), this version runs
>>> in a
>>> few seconds rather than multiple hours. It also does not require the
>>> creation
>>> of any new variables, just a couple of mata vectors; so, it does not
>>> increase
>>> memory usage much at all.
>>>
>>> insheet breaks for me, and others I provide support for, constantly.
>>> Between
>>> truncating data, misinterpreting column breaks, and not using double by
>>> default, I think insheet should be used more conservatively than most may
>>> expect given the apparent simplicity of the command, especially since a
>>> lot of
>>> these errors are silent and are not easy to catch.
>>>
>>> I wrote insheet2/replace_dquotes.py to try to be a catch-all place to put
>>> all
>>> the necessary guards for insheet, to be used without second thought. I'm
>>> not
>>> 100% sure that I've caught everything, but it has worked for me on all
>>> the
>>> datasets that have failed with insheet, with the exception of one
>>> observation
>>> files that do not have a header, which Stata still interprets as having 0
>>> observations without the 'nonames' argument.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index