Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Ben Hoen" <bhoen@lbl.gov> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: RE: Easy Question? Counting cases based on a "target" case |

Date |
Wed, 26 Dec 2012 14:57:41 -0500 |

Thanks David, but that does not answer the question I had intended to ask (though your answer gives me additional insights) Let me try to be clearer. Assume you simplified the example to the following, and using your suggestion for coding: sysuse auto, clear keep in 1/20 g id=_n g pricek=int(price/1000) //I am simplifying the levels of price to the 1000's keep id price pricek //to clean out unwanted variables levelsof pricek, local(pricesk) foreach p of local pricesk { gen near`p' = inrange(pricek, `=`p'-2', `=`p'+2') } egen countneark = rowtotal(near*) drop near* tab pricek *==========================end In the above example the correct totals can be calculated based on the tabulate output. For the various levels of pricek I should have the following counts of "near" cases (assuming the individual case is counted) pricek count of near cases 3 15 4 15 5 16 7 5 10 2 11 2 14 2 15 2 This is not what is generated by the countneark variable. Further, in my real application I have over 170,000 values of the variable that is being used as the criterion, and therefore it seems like it will be inefficient to develop all of the levels based on them. Finally, I should add, I had envisioned using more than one criteria based on more than one variables, all relative to the respective case, with which to evaluate the cases to be counted. So, for example, I would use price +/- 2000 and mpg +/- 3. Any additional insight would be much appreciated. Ben Ben Hoen LBNL Office: 845-758-1896 Cell: 718-812-7589 -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of David Radwin Sent: Wednesday, December 26, 2012 1:25 PM To: statalist@hsphsun2.harvard.edu Subject: st: RE: Easy Question? Counting cases based on a "target" case Ben, I don't think you need to loop over observations, but you can loop over values which is fairly efficient. Something like this: levelsof price, local(prices) foreach p of local prices { gen near`p' = inrange(price, `=`p'-2000', `=`p'+2000') } egen countnear = rowtotal(near*) In the example above I use all prices, but you could substitute the following line for the first and second line above: foreach p of numlist 1900 2500 4000 6500 10000 { David -- David Radwin Senior Research Associate MPR Associates, Inc. 2150 Shattuck Ave., Suite 800 Berkeley, CA 94704 Phone: 510-849-4942 Fax: 510-849-0794 www.mprinc.com > -----Original Message----- > From: owner-statalist@hsphsun2.harvard.edu [mailto:owner- > statalist@hsphsun2.harvard.edu] On Behalf Of Ben Hoen > Sent: Wednesday, December 26, 2012 10:06 AM > To: statalist@hsphsun2.harvard.edu > Subject: st: Easy Question? Counting cases based on a "target" case > > I want to perform a function that I think would be easy but I can't wrap > my > head around how to perform it without looping through each case. > > I want to create a count of the number of records in the file that meet a > certain criteria based on a respective case's value. So for example using > the auto dataset: > > *====================begin > sysuse auto, clear > g id=_n > egen nearprice2000=count(id) if... //count the number of other cases in > the > dataset if the price of the car is within $2000 of the price of this > cases' > (i.e., target) car's price > > *====================end > > The egen command is how I thought I would resolve this, but I can't figure > it out exactly. The nearprice2000 would equal the count for each case of > the number of other cases in the dataset that have a price that is either > +/- $2000 from the particular case's price. So if the full dataset had > only > 5 prices: 1900, 2500, 4000, 6500, and 10000, their respective nearprice200 > values would be: 2, 3, 2, 2, and 1 (if itself would be included in the > count) or 1, 2, 1, 1, and 0 (if itself would NOT be included in the count) > > I might be able to do this by looping through the cases, but I know that > is > not encouraged by other more experienced users. > > Any advice would be greatly appreciated. > > Ben * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**st: RE: RE: RE: Easy Question? Counting cases based on a "target" case***From:*"David Radwin" <dradwin@mprinc.com>

**References**:**st: Easy Question? Counting cases based on a "target" case***From:*"Ben Hoen" <bhoen@lbl.gov>

**st: RE: Easy Question? Counting cases based on a "target" case***From:*"David Radwin" <dradwin@mprinc.com>

- Prev by Date:
**Re: st: create one factor level from two existing** - Next by Date:
**Re: st: Re: margins after xtlogit,fe** - Previous by thread:
**st: RE: Easy Question? Counting cases based on a "target" case** - Next by thread:
**st: RE: RE: RE: Easy Question? Counting cases based on a "target" case** - Index(es):