Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: How "sort" and "by" behave

From   "Nick Cox" <>
To   <>
Subject   st: RE: How "sort" and "by" behave
Date   Mon, 18 Aug 2003 19:32:14 +0100
> I have an observation to make, I have a data set that is 
> sorted by site,
> subject and rnum and I am always creating / manipulating  
> variables at the
> subject level (rnum=1,..,20), as follows
> /***	1st run **/
> sort site subject rnum 
> forval X = 1/6 { 
> 	genl prevcheck`X'=moa`X'[_n-1] if tag`X'==1 
> }
> /***	2nd run **/
> sort site subject rnum 
> forval X = 1/6 { 
> 	genl prevcheck`X'=moa`X'[_n-1] if tag`X'==1 , by(site subject)
> }
> The variables prevcheck1-prevcheck6 were identical from the 
> two runs - can I
> trust that "by(site subject)" will always be redundant once 
> Stata registers
> the sorting structure.

This in turn provokes a second question: what is -genl-?
It is a program published by Jeroen Weesie in STB-35 in 1997. 

To answer the question, we need to look inside. Here are 
the relevant lines of code, given an expression `exp'
defining a new variable `varlist', and simplifying a 

	tempvar x
	rename `varlist' `x'
	sort `by' 
	local By "by `by': "
	quietly `By' replace `x' = `exp' 

In Amani's case invoking, for example, 

 	genl prevcheck1=moa1[_n-1] if tag1==1 , by(site subject)

is thus equivalent to (in Stata 7 or 8 terms) 

	bysort site subject: gen prevcheck1 = moa1[_n-1] if tag1 == 1

and this is, I think, safe given Amani's prior sort. 

However, time and Stata move on. 

1. Positively, -genl- has a key feature: it automatically
creates a variable label and a characteristic that contains
the defining expression. 

2. Negatively, as a program written for Stata 5, -genl- 
could not include a key feature included in Stata 7. 

To be absolutely sure in all circumstances that your 
-sort- order is maintained you can always do this: 

	bysort site subject (rnum): gen prevcheck1 = moa1[_n-1] if tag1 == 1

Doing this explicitly is, in my view, the best way 
to be _sure_ that you get what you want. You might well 
be able to pass 

	site subject (rnum) 

as an argument to -by()-, but I've not tried it. 


P.S. Amani just added a footnote: 

/***	Sorry I needed to modify to tag`X'[_n] **/

Not so; this does no harm, but it changes nothing. Stata
always understands -varname- to mean -varname[_n]- in 
contexts like these. 

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index