Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: RE: Easy Question? Counting cases based on a "target" case


From   "Ben Hoen" <bhoen@lbl.gov>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: RE: Easy Question? Counting cases based on a "target" case
Date   Wed, 26 Dec 2012 14:57:41 -0500

Thanks David, but that does not answer the question I had intended to ask
(though your answer gives me additional insights)

Let me try to be clearer.  Assume you simplified the example to the
following, and using your suggestion for coding:

sysuse auto, clear
keep in 1/20
g id=_n
g pricek=int(price/1000) //I am simplifying the levels of price to the
1000's
keep id price pricek //to clean out unwanted variables
levelsof pricek, local(pricesk)
foreach p of local pricesk {
	gen near`p' = inrange(pricek, `=`p'-2', `=`p'+2')
	}
egen countneark = rowtotal(near*)
drop near*
tab pricek
*==========================end

In the above example the correct totals can be calculated based on the
tabulate output.  For the various levels of pricek I should have the
following counts of "near" cases (assuming the individual case is counted)

pricek		count of near cases
3		15
4		15
5		16
7		5
10		2
11		2
14		2
15		2

This is not what is generated by the countneark variable.  

Further, in my real application I have over 170,000 values of the variable
that is being used as the criterion, and therefore it seems like it will be
inefficient to develop all of the levels based on them.  Finally, I should
add, I had envisioned using more than one criteria based on more than one
variables, all relative to the respective case, with which to evaluate the
cases to be counted.  So, for example, I would use price +/- 2000 and mpg
+/- 3.   

Any additional insight would be much appreciated.

Ben

Ben Hoen
LBNL
Office: 845-758-1896
Cell: 718-812-7589


-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of David Radwin
Sent: Wednesday, December 26, 2012 1:25 PM
To: statalist@hsphsun2.harvard.edu
Subject: st: RE: Easy Question? Counting cases based on a "target" case

Ben,

I don't think you need to loop over observations, but you can loop over
values which is fairly efficient. Something like this:


levelsof price, local(prices)
foreach p of local prices {
	gen near`p' = inrange(price, `=`p'-2000', `=`p'+2000')
	}
egen countnear = rowtotal(near*)


In the example above I use all prices, but you could substitute the
following line for the first and second line above:

	foreach p of numlist 1900 2500 4000 6500 10000 {

David
--
David Radwin
Senior Research Associate
MPR Associates, Inc.
2150 Shattuck Ave., Suite 800
Berkeley, CA 94704
Phone: 510-849-4942
Fax: 510-849-0794

www.mprinc.com


> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-
> statalist@hsphsun2.harvard.edu] On Behalf Of Ben Hoen
> Sent: Wednesday, December 26, 2012 10:06 AM
> To: statalist@hsphsun2.harvard.edu
> Subject: st: Easy Question? Counting cases based on a "target" case
> 
> I want to perform a function that I think would be easy but I can't wrap
> my
> head around how to perform it without looping through each case.
> 
> I want to create a count of the number of records in the file that meet
a
> certain criteria based on a respective case's value.  So for example
using
> the auto dataset:
> 
> *====================begin
> sysuse auto, clear
> g id=_n
> egen nearprice2000=count(id) if... //count the number of other cases in
> the
> dataset if the price of the car is within $2000 of the price of this
> cases'
> (i.e., target) car's price
> 
> *====================end
> 
> The egen command is how I thought I would resolve this, but I can't
figure
> it out exactly.  The nearprice2000 would equal the count for each case
of
> the number of other cases in the dataset that have a price that is
either
> +/- $2000 from the particular case's price.  So if the full dataset had
> only
> 5 prices: 1900, 2500, 4000, 6500, and 10000, their respective
nearprice200
> values would be: 2, 3, 2, 2, and 1 (if itself would be included in the
> count) or 1, 2, 1, 1, and 0 (if itself would NOT be included in the
count)
> 
> I might be able to do this by looping through the cases, but I know that
> is
> not encouraged by other more experienced users.
> 
> Any advice would be greatly appreciated.
> 
> Ben

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index