Nick Cox <njcoxstata@gmail.com>

statalist@hsphsun2.harvard.edu

Subject
Re: st: set obs by level (of multiple variables)?

Date
Wed, 23 Jan 2013 12:15:52 +0000

I am not sure that I understand your set-up but often it is a good idea to tag the extra observations, something like local N = _N expand <whatever> gen expanded = _n > `N' This exploits the fact that -expand- puts additional observations at the end of the dataset. Note that this _cannot_ be reduced to expand <whatever> gen expanded = _n > _N and it is _essential_ to evaluate _N as a number before you -expand-. Once you have that tag, you could work ... if expanded On Wed, Jan 23, 2013 at 11:29 AM, Tim Evans <Tim.Evans@wmciu.nhs.uk> wrote: > Hi all, further to this, I have another query. When I have generated my duplicate, I wish to change the values of the variables in only one of the duplicated row. At present I am doing this: > > bysort TCATOG sex: gen first2 = (_n==1) > expand 2 if first2 > > Now I want to say, where a duplicate replace the contents of the variables - but I'm struggling as to identify just one of the 'first2' records where first2==1 and not replace the contents of both rows. > > For instance my data (only partially reproduced) look like this: > > start end n d cp_e2 cr_e2 TCATOG2sexfirst2 > 0 1 442 16 0.9563 1.0079 pTa Males 1 > 1 2 426 19 0.9123 1.0093 pTa Males 0 > 2 3 407 26 0.8686 0.9924 pTa Males 0 > 3 4 381 29 0.8259 0.9642 pTa Males 0 > 4 5 352 19 0.7839 0.9611 pTa Males 0 > 5 6 333 26 0.7420 0.9361 pTa Males 0 > 6 7 307 22 0.7015 0.9192 pTa Males 0 > 7 8 285 23 0.6624 0.8949 pTa Males 0 > 8 9 262 25 0.6270 0.8552 pTa Males 0 > 9 10 237 20 0.5938 0.8268 pTa Males 0 > 10 11 217 8 0.5624 0.8408 pTa Males 0 > 11 12 209 12 0.5313 0.8390 pTa Males 0 > 12 13 197 16 0.5005 0.8183 pTa Males 0 > 13 14 181 9 0.4703 0.8274 pTa Males 0 > 14 15 172 10 0.4415 0.8302 pTa Males 0 > 15 16 162 6 0.4143 0.8520 pTa Males 0 > 16 17 156 4 0.3871 0.8884 pTa Males 0 > 17 18 152 10 0.3606 0.8883 pTa Males 0 > 18 19 130 9 0.3375 0.8701 pTa Males 0 > 19 20 77 6 0.3157 0.7956 pTa Males 0 > 0 1 442 16 0.9563 1.0079 pTa Males 1 > > > I wish to change the final row with: > start end n d cp_e2 cr_e2 TCATOG2sexfirst2 > 0 0 442 16 1.000 1.000 pTa Males 1 > > I can then sort the data according to 'end' > > I have four categories of TCATOG2 and both Males and Females. > > Best wishes > > Tim > > > -----Original Message----- > From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Tim Evans > Sent: 18 January 2013 10:06 > To: statalist@hsphsun2.harvard.edu > Subject: RE: st: set obs by level (of multiple variables)? > > Thanks Rebecca for your advice - much appreciated. > > Best wishes > > Tim > > > -----Original Message----- > From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Rebecca Pope > Sent: 17 January 2013 16:31 > To: statalist@hsphsun2.harvard.edu > Subject: Re: st: set obs by level (of multiple variables)? > > bysort TCATOG sex : gen first2 = (_n==1) > > That tests whether it is the first observation or not & returns 1 if true, 0 otherwise. > > On Thu, Jan 17, 2013 at 10:24 AM, Tim Evans <Tim.Evans@wmciu.nhs.uk> wrote: >> Nick thanks for your help. This does what I need, although, rather than duplicating the last record, duplicating the first might be more helpful as this would contain much of the baseline information I already hold. I naively thought that this would work!!: >> >> bysort TCATOG sex : gen first2 = _n - but I have 1-20 rather than 1 >> followed by 0 >> >> I could then use replace first2 = 0 if first !=1 - but I'm assuming there is a better way? >> >> Best wishes >> >> Tim >> >> >> >> >> -----Original Message----- >> From: owner-statalist@hsphsun2.harvard.edu >> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox >> Sent: 17 January 2013 15:53 >> To: statalist@hsphsun2.harvard.edu >> Subject: Re: st: set obs by level (of multiple variables)? >> >> The syntax diagram for -set- does not indicate support for -by:- and >> >> 1. Whatever is not permitted should be considered forbidden. >> >> 2. Less gnomically, there is a really good reason for this. In essence, -set- is about global settings, and even if what you are asking for makes sense -- as it does here -- -set- and -by:- don't mix naturally. >> >> See help for -expand-, -expandcl-, -expandby- (SSC). >> >> bysort stage sex : gen last = _n == _N expand 2 if last sort stage sex >> ... if last >> >> On Thu, Jan 17, 2013 at 3:08 PM, Tim Evans <Tim.Evans@wmciu.nhs.uk> wrote: >> >>> I'm trying to insert extra observations in my dataset - I've calculated survival and wish to graph the results but the data start from less than 100%, but I'd like the graph to graph from time 0 and thus 100%. My dataset is split by gender and stage so I need something that inserts an observation for say males & stage 1, males stage 2, females stage 1 and females stage 2. >>> >>> Unfortunately, while this will provide me with an observation >>> >>> set obs `=_N+1' it does not support this: >>> >>> bysort stage sex: set obs `=_N+1' >>> >>> Does anyone have an idea how I might do this in Stata 11.2? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

