Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: keep only one observation


From   "Sarah Edgington" <sedging@ucla.edu>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: keep only one observation
Date   Tue, 5 Jun 2012 14:45:15 -0700

Lars,
To get a sequence number within ID and date you'd want 
bysort id date_var : gen seq=_n

The syntax you used makes sure that date_var is sorted within id but doesn't
group by date when assigning seq.  If I understand you correctly, you
actually want to define groups in terms of dates within IDs.

To keep just the last observation for a date you could do
bysort id date_var: keep if _n==_N

That saves you the step of creating the seq number separately.  Of course,
if you want to make sure you understand exactly what's happening you
probably don't want to jump straight to that.  To see what's actually
happening, try this:
bysort id date_var : gen seq=_n
bysort id date_var : gen max=_N
mark keep if seq==max

I think that gets you where you're trying to go.

-Sarah


-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Lars Folkestad
Sent: Tuesday, June 05, 2012 1:09 PM
To: statalist@hsphsun2.harvard.edu
Subject: st: keep only one observation

Dear list

I have a dataset of measured continuous data. The observations are from
different participants. In my database i have 10.000+ observations.  I have
an unique identifier pr participant. Some participants have been measured
more than once in the observational period (on different dates). And for
most participants i have more than one measurement pr date (for simplicity
lets say they did not live up to strict quality measures and thus are
rendered useless).

I want to delete all other than the last observation pr participants on the
different dates.

I've tried to generate a date sequence variable (seq) by

bysort id (date_var) : gen seq = _n

But this does not give me what i want. I get a sequence number pr id but it
does not seem to take into account the date_var.

Fx

What i get
Id date seq
1 01.01.01 1
1 01.01.01 2
1 01.02.02 3
1 01.04.05 4


What i want
Id date seq
1 01.01.01 1
1 01.01.01 2
1 01.02.02 1
1 01.04.05 1


So a two part question:
How do i get stata to give a sequence variable that takes date and id into
account?
How do i then drop all but the last (the larges seq nr) from the list taking
id into account?

Hope this is somewhat clear.

lars

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index