Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: RE: keep only one observation

From   "Sarah Edgington" <>
To   <>
Subject   RE: st: RE: keep only one observation
Date   Tue, 5 Jun 2012 17:47:00 -0700

Sorry that wasn't entirely clear.
My intent was to have a flag for the observations that will be kept so that
the data can be visually inspected to see that the right observations have
been selected.  It doesn't actually drop the unwanted observations, though.
For that you'd need to -keep if keep==1-. 

As an aside, has -mark- always been programming syntax?  I've long used it
for creating dichotomous variables even when not defining a program.  I
imagine I picked up the habit from someone's example code back when I was
learning Stata (I think I started with version 7).  So I was a little
surprised when I looked at the documentation recently and noticed that it's
in the programming manual and not the base reference.  I guess the more
standard way to make that sort of indicator would be - gen keep=(seq==max) -
Using -mark- is going to be a hard habit to break, though.  (Note there's
probably also a stylistic argument to be made for not using names like keep
(or drop) when flagging observations to keep (or drop) to avoid confusion).


-----Original Message-----
[] On Behalf Of Nick Cox
Sent: Tuesday, June 05, 2012 4:45 PM
Subject: Re: st: RE: keep only one observation


mark keep if seq==max

I imagine that Sarah meant

keep if seq==max


On Tue, Jun 5, 2012 at 10:45 PM, Sarah Edgington <> wrote:
> Lars,
> To get a sequence number within ID and date you'd want bysort id 
> date_var : gen seq=_n
> The syntax you used makes sure that date_var is sorted within id but 
> doesn't group by date when assigning seq.  If I understand you 
> correctly, you actually want to define groups in terms of dates within
> To keep just the last observation for a date you could do bysort id 
> date_var: keep if _n==_N
> That saves you the step of creating the seq number separately.  Of 
> course, if you want to make sure you understand exactly what's 
> happening you probably don't want to jump straight to that.  To see 
> what's actually happening, try this:
> bysort id date_var : gen seq=_n
> bysort id date_var : gen max=_N
> mark keep if seq==max
> I think that gets you where you're trying to go.

Lars Folkestad

> I have a dataset of measured continuous data. The observations are 
> from different participants. In my database i have 10.000+ 
> observations.  I have an unique identifier pr participant. Some 
> participants have been measured more than once in the observational 
> period (on different dates). And for most participants i have more 
> than one measurement pr date (for simplicity lets say they did not 
> live up to strict quality measures and thus are rendered useless).
> I want to delete all other than the last observation pr participants 
> on the different dates.
> I've tried to generate a date sequence variable (seq) by
> bysort id (date_var) : gen seq = _n
> But this does not give me what i want. I get a sequence number pr id 
> but it does not seem to take into account the date_var.
> Fx
> What i get
> Id date seq
> 1 01.01.01 1
> 1 01.01.01 2
> 1 01.02.02 3
> 1 01.04.05 4
> What i want
> Id date seq
> 1 01.01.01 1
> 1 01.01.01 2
> 1 01.02.02 1
> 1 01.04.05 1
> So a two part question:
> How do i get stata to give a sequence variable that takes date and id 
> into account?
> How do i then drop all but the last (the larges seq nr) from the list 
> taking id into account?
> Hope this is somewhat clear.

*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index