Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Ben Hoen" <bhoen@lbl.gov> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: RE: RE: RE: Easy Question? Counting cases based on a "target" case |

Date |
Mon, 31 Dec 2012 10:50:04 -0500 |

Thanks David. Ben Hoen LBNL Office: 845-758-1896 Cell: 718-812-7589 -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of David Radwin Sent: Wednesday, December 26, 2012 7:53 PM To: statalist@hsphsun2.harvard.edu Subject: st: RE: RE: RE: Easy Question? Counting cases based on a "target" case OK, I'm not sure I understand, but how about this? sysuse auto, clear keep in 1/20 g id=_n g pricek=int(price/1000) //I am simplifying the levels of price to the1000's keep id price pricek //to clean out unwanted variables gen countnear=. levelsof pricek, local(pricesk) foreach p of local pricesk { quietly count if inrange(pricek, `=`p'-2', `=`p'+2') replace countnear = `r(N)' if pricek == `p' display as result _newline "There are `r(N)' obs with value near `p'." } The result is: . table pricek countnear ---------------------------------- | countnear pricek | 2 5 15 16 ----------+----------------------- 3 | 5 4 | 6 5 | 4 7 | 1 10 | 1 11 | 1 14 | 1 15 | 1 ---------------------------------- If you want a dataset where each observation is a different value of pricek and the count of observations near to that value of pricek, you could -collapse- the data afterward like this: . collapse (first) countnear, by(pricek) . list +-------------------+ | pricek countn~r | |-------------------| 1. | 3 15 | 2. | 4 15 | 3. | 5 16 | 4. | 7 5 | 5. | 10 2 | |-------------------| 6. | 11 2 | 7. | 14 2 | 8. | 15 2 | +-------------------+ I admit it will be harder if you want to use more than one criterion. David -- David Radwin Senior Research Associate MPR Associates, Inc. 2150 Shattuck Ave., Suite 800 Berkeley, CA 94704 Phone: 510-849-4942 Fax: 510-849-0794 www.mprinc.com > -----Original Message----- > From: owner-statalist@hsphsun2.harvard.edu [mailto:owner- > statalist@hsphsun2.harvard.edu] On Behalf Of Ben Hoen > Sent: Wednesday, December 26, 2012 11:58 AM > To: statalist@hsphsun2.harvard.edu > Subject: st: RE: RE: Easy Question? Counting cases based on a "target" > case > > Thanks David, but that does not answer the question I had intended to ask > (though your answer gives me additional insights) > > Let me try to be clearer. Assume you simplified the example to the > following, and using your suggestion for coding: > > sysuse auto, clear > keep in 1/20 > g id=_n > g pricek=int(price/1000) //I am simplifying the levels of price to the > 1000's > keep id price pricek //to clean out unwanted variables > levelsof pricek, local(pricesk) > foreach p of local pricesk { > gen near`p' = inrange(pricek, `=`p'-2', `=`p'+2') > } > egen countneark = rowtotal(near*) > drop near* > tab pricek > *==========================end > > In the above example the correct totals can be calculated based on the > tabulate output. For the various levels of pricek I should have the > following counts of "near" cases (assuming the individual case is counted) > > pricek count of near cases > 3 15 > 4 15 > 5 16 > 7 5 > 10 2 > 11 2 > 14 2 > 15 2 > > This is not what is generated by the countneark variable. > > Further, in my real application I have over 170,000 values of the variable > that is being used as the criterion, and therefore it seems like it will > be > inefficient to develop all of the levels based on them. Finally, I should > add, I had envisioned using more than one criteria based on more than one > variables, all relative to the respective case, with which to evaluate the > cases to be counted. So, for example, I would use price +/- 2000 and mpg > +/- 3. > > Any additional insight would be much appreciated. > > Ben > > Ben Hoen > LBNL > Office: 845-758-1896 > Cell: 718-812-7589 > > > -----Original Message----- > From: owner-statalist@hsphsun2.harvard.edu > [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of David Radwin > Sent: Wednesday, December 26, 2012 1:25 PM > To: statalist@hsphsun2.harvard.edu > Subject: st: RE: Easy Question? Counting cases based on a "target" case > > Ben, > > I don't think you need to loop over observations, but you can loop over > values which is fairly efficient. Something like this: > > > levelsof price, local(prices) > foreach p of local prices { > gen near`p' = inrange(price, `=`p'-2000', `=`p'+2000') > } > egen countnear = rowtotal(near*) > > > In the example above I use all prices, but you could substitute the > following line for the first and second line above: > > foreach p of numlist 1900 2500 4000 6500 10000 { > > David > -- > David Radwin > Senior Research Associate > MPR Associates, Inc. > 2150 Shattuck Ave., Suite 800 > Berkeley, CA 94704 > Phone: 510-849-4942 > Fax: 510-849-0794 > > www.mprinc.com > > > > -----Original Message----- > > From: owner-statalist@hsphsun2.harvard.edu [mailto:owner- > > statalist@hsphsun2.harvard.edu] On Behalf Of Ben Hoen > > Sent: Wednesday, December 26, 2012 10:06 AM > > To: statalist@hsphsun2.harvard.edu > > Subject: st: Easy Question? Counting cases based on a "target" case > > > > I want to perform a function that I think would be easy but I can't wrap > > my > > head around how to perform it without looping through each case. > > > > I want to create a count of the number of records in the file that meet > a > > certain criteria based on a respective case's value. So for example > using > > the auto dataset: > > > > *====================begin > > sysuse auto, clear > > g id=_n > > egen nearprice2000=count(id) if... //count the number of other cases in > > the > > dataset if the price of the car is within $2000 of the price of this > > cases' > > (i.e., target) car's price > > > > *====================end > > > > The egen command is how I thought I would resolve this, but I can't > figure > > it out exactly. The nearprice2000 would equal the count for each case > of > > the number of other cases in the dataset that have a price that is > either > > +/- $2000 from the particular case's price. So if the full dataset had > > only > > 5 prices: 1900, 2500, 4000, 6500, and 10000, their respective > nearprice200 > > values would be: 2, 3, 2, 2, and 1 (if itself would be included in the > > count) or 1, 2, 1, 1, and 0 (if itself would NOT be included in the > count) > > > > I might be able to do this by looping through the cases, but I know that > > is > > not encouraged by other more experienced users. > > > > Any advice would be greatly appreciated. > > > > Ben > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Easy Question? Counting cases based on a "target" case***From:*"Ben Hoen" <bhoen@lbl.gov>

**st: RE: Easy Question? Counting cases based on a "target" case***From:*"David Radwin" <dradwin@mprinc.com>

**st: RE: RE: Easy Question? Counting cases based on a "target" case***From:*"Ben Hoen" <bhoen@lbl.gov>

**st: RE: RE: RE: Easy Question? Counting cases based on a "target" case***From:*"David Radwin" <dradwin@mprinc.com>

- Prev by Date:
**RE: st: RE: Creating / generating Industry level time dummy variables** - Next by Date:
**Re: st: extracting of data** - Previous by thread:
**st: RE: RE: RE: Easy Question? Counting cases based on a "target" case** - Next by thread:
**st: Tobit graphics** - Index(es):