Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# st: RE: RE: RE: RE: Easy Question? Counting cases based on a "target" case

 From "Ben Hoen" To Subject st: RE: RE: RE: RE: Easy Question? Counting cases based on a "target" case Date Mon, 31 Dec 2012 10:50:04 -0500

```Thanks David.

Ben Hoen
LBNL
Office: 845-758-1896
Cell: 718-812-7589

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of David Radwin
Sent: Wednesday, December 26, 2012 7:53 PM
To: statalist@hsphsun2.harvard.edu
Subject: st: RE: RE: RE: Easy Question? Counting cases based on a "target"
case

sysuse auto, clear
keep in 1/20
g id=_n
g pricek=int(price/1000) //I am simplifying the levels of price to
the1000's
keep id price pricek //to clean out unwanted variables
gen countnear=.
levelsof pricek, local(pricesk)
foreach p of local pricesk {
quietly count if inrange(pricek, `=`p'-2', `=`p'+2')
replace countnear = `r(N)' if pricek == `p'
display as result _newline "There are `r(N)' obs with value near
`p'."
}

The result is:

. table pricek countnear

----------------------------------
|       countnear
pricek |    2     5    15    16
----------+-----------------------
3 |                5
4 |                6
5 |                      4
7 |          1
10 |    1
11 |    1
14 |    1
15 |    1
----------------------------------

If you want a dataset where each observation is a different value of
pricek and the count of observations near to that value of pricek, you
could -collapse- the data afterward like this:

. collapse (first) countnear, by(pricek)

. list

+-------------------+
| pricek   countn~r |
|-------------------|
1. |      3         15 |
2. |      4         15 |
3. |      5         16 |
4. |      7          5 |
5. |     10          2 |
|-------------------|
6. |     11          2 |
7. |     14          2 |
8. |     15          2 |
+-------------------+

I admit it will be harder if you want to use more than one criterion.

David
--
Senior Research Associate
MPR Associates, Inc.
2150 Shattuck Ave., Suite 800
Berkeley, CA 94704
Phone: 510-849-4942
Fax: 510-849-0794

www.mprinc.com

> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-
> statalist@hsphsun2.harvard.edu] On Behalf Of Ben Hoen
> Sent: Wednesday, December 26, 2012 11:58 AM
> To: statalist@hsphsun2.harvard.edu
> Subject: st: RE: RE: Easy Question? Counting cases based on a "target"
> case
>
> Thanks David, but that does not answer the question I had intended to
>
> Let me try to be clearer.  Assume you simplified the example to the
> following, and using your suggestion for coding:
>
> sysuse auto, clear
> keep in 1/20
> g id=_n
> g pricek=int(price/1000) //I am simplifying the levels of price to the
> 1000's
> keep id price pricek //to clean out unwanted variables
> levelsof pricek, local(pricesk)
> foreach p of local pricesk {
> 	gen near`p' = inrange(pricek, `=`p'-2', `=`p'+2')
> 	}
> egen countneark = rowtotal(near*)
> drop near*
> tab pricek
> *==========================end
>
> In the above example the correct totals can be calculated based on the
> tabulate output.  For the various levels of pricek I should have the
> following counts of "near" cases (assuming the individual case is
counted)
>
> pricek		count of near cases
> 3		15
> 4		15
> 5		16
> 7		5
> 10		2
> 11		2
> 14		2
> 15		2
>
> This is not what is generated by the countneark variable.
>
> Further, in my real application I have over 170,000 values of the
variable
> that is being used as the criterion, and therefore it seems like it will
> be
> inefficient to develop all of the levels based on them.  Finally, I
should
> add, I had envisioned using more than one criteria based on more than
one
> variables, all relative to the respective case, with which to evaluate
the
> cases to be counted.  So, for example, I would use price +/- 2000 and
mpg
> +/- 3.
>
> Any additional insight would be much appreciated.
>
> Ben
>
> Ben Hoen
> LBNL
> Office: 845-758-1896
> Cell: 718-812-7589
>
>
> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu
> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of David Radwin
> Sent: Wednesday, December 26, 2012 1:25 PM
> To: statalist@hsphsun2.harvard.edu
> Subject: st: RE: Easy Question? Counting cases based on a "target" case
>
> Ben,
>
> I don't think you need to loop over observations, but you can loop over
> values which is fairly efficient. Something like this:
>
>
> levelsof price, local(prices)
> foreach p of local prices {
> 	gen near`p' = inrange(price, `=`p'-2000', `=`p'+2000')
> 	}
> egen countnear = rowtotal(near*)
>
>
> In the example above I use all prices, but you could substitute the
> following line for the first and second line above:
>
> 	foreach p of numlist 1900 2500 4000 6500 10000 {
>
> David
> --
> Senior Research Associate
> MPR Associates, Inc.
> 2150 Shattuck Ave., Suite 800
> Berkeley, CA 94704
> Phone: 510-849-4942
> Fax: 510-849-0794
>
> www.mprinc.com
>
>
> > -----Original Message-----
> > From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-
> > statalist@hsphsun2.harvard.edu] On Behalf Of Ben Hoen
> > Sent: Wednesday, December 26, 2012 10:06 AM
> > To: statalist@hsphsun2.harvard.edu
> > Subject: st: Easy Question? Counting cases based on a "target" case
> >
> > I want to perform a function that I think would be easy but I can't
wrap
> > my
> > head around how to perform it without looping through each case.
> >
> > I want to create a count of the number of records in the file that
meet
> a
> > certain criteria based on a respective case's value.  So for example
> using
> > the auto dataset:
> >
> > *====================begin
> > sysuse auto, clear
> > g id=_n
> > egen nearprice2000=count(id) if... //count the number of other cases
in
> > the
> > dataset if the price of the car is within \$2000 of the price of this
> > cases'
> > (i.e., target) car's price
> >
> > *====================end
> >
> > The egen command is how I thought I would resolve this, but I can't
> figure
> > it out exactly.  The nearprice2000 would equal the count for each case
> of
> > the number of other cases in the dataset that have a price that is
> either
> > +/- \$2000 from the particular case's price.  So if the full dataset
> > only
> > 5 prices: 1900, 2500, 4000, 6500, and 10000, their respective
> nearprice200
> > values would be: 2, 3, 2, 2, and 1 (if itself would be included in the
> > count) or 1, 2, 1, 1, and 0 (if itself would NOT be included in the
> count)
> >
> > I might be able to do this by looping through the cases, but I know
that
> > is
> > not encouraged by other more experienced users.
> >
> > Any advice would be greatly appreciated.
> >
> > Ben
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```