Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Splitting Observations if Ties are Present


From   n j cox <n.j.cox@durham.ac.uk>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Splitting Observations if Ties are Present
Date   Tue, 30 Oct 2007 20:39:42 +0000

-sxpose- is a user-written command on SSC. (You are
asked to say where non-official programs you use come from.)

I guess wildly that this segment is part of some larger
dataset with some identifier, say -id-, labelling distinct
groups (panels).

Under that assumption, you can proceed in various ways.
Here is one I like a lot

bysort id (firsttag) : gen first = value[1]
bysort id (maxtag) : gen max = value[1]
bysort id (mintag) : gen min = value[1]
bysort id (lasttag) : gen last = value[1]
by id: keep if _n == 1

Consider the first statement. Within blocks
defined by -id-, sort the observation with
smallest value of -firsttag- to the front.
Then use its -value-.

The rest is just the same trick again and again,
plus clean-up: after that we can keep just one observation
from each group.

Here is another way to do the same thing:

foreach x in first max min last {
	bysort id (`x'tag) : gen `x' = value[1]
}
by id: keep if _n == 1

And another rather different way:

foreach x in first max min last {
	egen `x' = mean(value * `x'tag), by(id)
}
bysort id : keep if _n == 1

Nick
n.j.cox@durham.ac.uk

Thomas Speidel

I have a dataset that looks like this:

        date      value   maxtag   mintag   firsttag   lasttag
    Jan 2001     13.616        1        .          1         .
    Oct 2001      2.632        .        1          .         .
    Jul 2007      6.474        .        .          .         1

*tag indicates whether a particular value was the first, last, min or
max.  Obviously, the possibility of ties exist.  In this case 13.616 is
both the max and first value.

What I am trying to get at is single observation dataset that looks like
this:

firsttag  mintag   maxtag   lasttag
13.616      2.632       13.616   6.474

I have been able to do so when no ties are present (using sxpose after
some data management).  However, when ties are present I need to be able
to perhaps add one or two more observations to the initial dataset to
reflect the presence of ties (in this case I would need to add onother
row for Jan 2001 and change the max and firsttag accordingly so that
only one tag exists for each row).  Perhaps someone has a better suggestion.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index