Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: xt: unit-specific trends

From	László Sándor <[email protected]>
To	[email protected]
Subject	Re: st: xt: unit-specific trends
Date	Thu, 19 Apr 2012 18:35:14 -0400

Thank you, Bill.

Of course it's great to have the correct results from Stata!

I am just a bit surprised that the "if" checks slow down operations
this much. Esp. by-loops. And esp. because -by:- wants to start sorted
anyway, I thought you could be less permitting later on (e.g. maintain
sort order). I would have guessed that the extra cost of not allowing
re-sorting would have justified a dramatic speedup of the -by- which
is pretty commonly used.

But exactly these are the sorts of trade-offs that you are experts in.
I will not second-guess your judgment.

Perhaps the biggest lesson to me was how costly an if-check is in
large datasets. Pretty frightening.

But most basic operations (sorting or checks) must be hitting some
theoretical limits, that's what we can squeeze out of computers.

Thanks!

Laszlo

On Thu, Apr 19, 2012 at 2:36 PM, William Gould, StataCorp LP
<[email protected]> wrote:
>
> Laszlo <[email protected]> wrote,
>
> > I used "if `touse'" because that is the official way to make a program
> > byable (http://www.stata.com/help.cgi?byable). If there is any case
> > where the -if- condition need not be checked for the entire dataset, a
> > -by: - run is that, isn't it?
>
> Laszlo is wrong in assuming that the data are necessarily sorted, and
> thus -if `touse' is the official way to program this case.
>
> The problem for -by- is that it is turning control over to a
> user-written program, and it is not uncommon for user-written programs
> to re-sort the data and then not put them back into the original
> order.  So -by- was written to accomondate that.
>
> If you as a programmer know that the the data will still be sorted
> you can convert the -if `touse'- into an -in- range by coding,
>
>        tempvar x
>        quietly gen long `x' = `touse'*_n
>        quietly sum `x', meanonly
>        local first = r(min)
>        local last  = r(max)
>        drop `x'
>
> In the rest of your code you can then code -in `first'/`last'- instead
> of -if `touse'-.
>
> There may be a quicker way to convert an -if `touse' into an -in- range.
> This is just the first way that occurred to me.
>
> I would still be hesitant to use -in- range instead of -if `touse'-
> because I would need to be certain that every command I used in my
> ado-file did not change the sort order.
>
> Here's demonstration that of a by-able program that re-sorts the data
> and yet still produces the expected results because it is coded using
> -if `touse'-:
>
>        . program tryit, byable(recall)
>          1.         di "hi"
>          2.         syntax
>          3.         marksample touse
>          4.         list rep78 if `touse'
>          5.         sort mpg
>          6. end
>
>        . sysuse auto, clear
>        (1978 Automobile Data)
>
>        . sort rep78
>
>        . by rep78: tryit
>
>        --------------------------------------
>        -> rep78 = 1
>        hi
>
>             +-------+
>             | rep78 |
>             |-------|
>          1. |     1 |
>          2. |     1 |
>             +-------+
>
>        --------------------------------------
>        -> rep78 = 2
>        hi
>
>             +-------+
>             | rep78 |
>             |-------|
>          3. |     2 |
>         14. |     2 |
>         15. |     2 |
>         22. |     2 |
>         24. |     2 |
>             |-------|
>         45. |     2 |
>         52. |     2 |
>         53. |     2 |
>             +-------+
>
>        <remaining output omitted>
>
>        . _
>
> When -tryit- was called the first time to process rep78==1, the data
> were in order, and we see that, as expected, the observations for
> which rep78 is 1 are at the top of the dataset, namely in observations
> 1 and 2.  Now look at the -tryit- code.  -tryit-, just before exiting,
> re-sorts the data!
>
> So, the second time -tryit- is called, when -tryit- is called to
> process the rep78 = 2 data, the observations will not be in order.
> And we can see that iun the listing.  The listing was produced by
> coding -list rep78 if `touse'- and, just as one would hope, all the
> observations for which `touse' contains 1 are rep78==2 observations.
> This time, however, the data are no longer in order.  The observations
> for which `touse' is 1 are observations 3, 14, 15, 22, 24, 45, 52, and
> 53.  It didn't matter, however, because we coded -if `touse'-.
>
> -by- plust -tryit- still produced correct results.
>
> Our thinking when we coded by and made the recommendation of using
> -if `touse'- was that sometimes it is better to produce correct
> results than to produce incorrect results more quickly.
>
> -- Bill
> [email protected]
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- Re: st: xt: unit-specific trends
  - From: "William Gould, StataCorp LP" <[email protected]>

Prev by Date: st: proper use of aweight
Next by Date: Re: st: xtmixed command
Previous by thread: Re: st: xt: unit-specific trends
Next by thread: Re: st: xt: unit-specific trends
Index(es):
- Date
- Thread