Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Selecting specific observations based on two variables


From   Maarten Buis <[email protected]>
To   [email protected]
Subject   Re: st: Selecting specific observations based on two variables
Date   Wed, 19 Jun 2013 15:24:53 +0200

On Wed, Jun 19, 2013 at 3:15 PM, henrik andersson wrote:
> I want to:
>
> (a) Use only one observation per Id
> (b) Use observation which value of Year is closest in absolute terms to 2009 (if Year==2009 then that observation(s) should be chosen).
> (c) If tie in (b) use observation with lowest value of Year.
> In addition if the above criteria is not able to single out one observation per Id, e.g. if there are two observation in the year 2009, it would be great if Stata then randomly could decide which one to pick.

The following example satisfies all criteria:

*------------------ begin example ------------------
clear
input id  year
1       2001
1       2002
1       2005
1       2009
1       2011
2       2001
2       2002
2       2003
2       2004
2       2006
2       2007
2       2011
end

gen dist = abs(2009-year)
bys id (dist year): gen touse = _n == 1
sort id year
lis
*------------------- end example -------------------
* (For more on examples I sent to the Statalist see:
* http://www.maartenbuis.nl/example_faq )

Hope this helps,
Maarten

---------------------------------
Maarten L. Buis
WZB
Reichpietschufer 50
10785 Berlin
Germany

http://www.maartenbuis.nl
---------------------------------
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index