Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Need help for calculation across observations within variable


From   Nick Cox <njcoxstata@gmail.com>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: Need help for calculation across observations within variable
Date   Tue, 21 May 2013 14:40:14 +0100

What's efficiency here?

If it's machine time, in principle you should not use -egen-. In
practice, it would take a big dataset or many repetitions to notice
the slow-down, likely to be less than the time taken to write
alternative code.

On (1), whether there is a difference:

If it's not machine time, but conciseness or simplicity of code, consider

bysort pt_name (year) : gen different = year[_N] != year[1]

except that a large group of Stata users might not agree on how
transparent that is.

This particular question is also an FAQ:

http://www.stata.com/support/faqs/data-management/listing-observations-in-group/

On (2), the number of distinct values, there is a detailed discussion in

SJ-8-4  dm0042  . . . . . . . . . . . .  Speaking Stata: Distinct observations
        (help distinct if installed)  . . . . . .  N. J. Cox and G. M. Longton
        Q4/08   SJ 8(4):557--568
        shows how to answer questions about distinct observations
        from first principles; provides a convenience command

Your solution is a good one.

Here is another

egen tag = tag(pt_name year)
egen max = total(tag), by(pt_name)

Am I being consistent about -egen-? This is how I resolve it:

1. Interactively, I will often use -egen- if an -egen- solution springs to mind.

2. In a program, I know I should rewrite -egen- calls to the extent
that a program is needed for serious or repeated use.

Nick
njcoxstata@gmail.com


On 21 May 2013 14:17, Michael Stewart <michaelstewartresearch@gmail.com> wrote:
> HI,
>
> I am looking to see if anyone could an efficient code than what I have
> been using for a particular issues that I am dealing with
>
> My Need
>
> 1)Create a variable which shows if the "year" is same or different by pat_name
> 2)Create a variable which shows number of distinct years ,per patient
>
> My dataset structure is as follows
>
> pt_name     year(string variable)
> 111             2009
> 111             2009
> 111             2009
> 111             2011
> 222             2009
> 222            2009
> 222            2010
>
> My code is two step one
> Step-1: bysort pt_name(year):  gen flag=_n==_N
> Step-2:egen max=total(flag),by(pt_name)
>
> Please let me know if there is an more efficient one step code
>
>
> --
> Thank you ,
> Yours Sincerely,
> Mike.
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index