Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: RE: RE: RE: Easy Question? Counting cases based on a "target" case

From	"Ben Hoen" <[email protected]>
To	<[email protected]>
Subject	st: RE: RE: RE: RE: Easy Question? Counting cases based on a "target" case
Date	Mon, 31 Dec 2012 10:50:04 -0500

Thanks David.  

Ben Hoen
LBNL
Office: 845-758-1896
Cell: 718-812-7589


-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of David Radwin
Sent: Wednesday, December 26, 2012 7:53 PM
To: [email protected]
Subject: st: RE: RE: RE: Easy Question? Counting cases based on a "target"
case

OK, I'm not sure I understand, but how about this? 

sysuse auto, clear
keep in 1/20
g id=_n
g pricek=int(price/1000) //I am simplifying the levels of price to
the1000's
keep id price pricek //to clean out unwanted variables
gen countnear=.
levelsof pricek, local(pricesk)
foreach p of local pricesk {
	quietly count if inrange(pricek, `=`p'-2', `=`p'+2')
	replace countnear = `r(N)' if pricek == `p'
	display as result _newline "There are `r(N)' obs with value near
`p'." 
	}

The result is:

. table pricek countnear

----------------------------------
          |       countnear    
   pricek |    2     5    15    16
----------+-----------------------
        3 |                5      
        4 |                6      
        5 |                      4
        7 |          1            
       10 |    1                  
       11 |    1                  
       14 |    1                  
       15 |    1                  
----------------------------------


If you want a dataset where each observation is a different value of
pricek and the count of observations near to that value of pricek, you
could -collapse- the data afterward like this:

. collapse (first) countnear, by(pricek)

. list

     +-------------------+
     | pricek   countn~r |
     |-------------------|
  1. |      3         15 |
  2. |      4         15 |
  3. |      5         16 |
  4. |      7          5 |
  5. |     10          2 |
     |-------------------|
  6. |     11          2 |
  7. |     14          2 |
  8. |     15          2 |
     +-------------------+

I admit it will be harder if you want to use more than one criterion.

David
--
David Radwin
Senior Research Associate
MPR Associates, Inc.
2150 Shattuck Ave., Suite 800
Berkeley, CA 94704
Phone: 510-849-4942
Fax: 510-849-0794

www.mprinc.com


> -----Original Message-----
> From: [email protected] [mailto:owner-
> [email protected]] On Behalf Of Ben Hoen
> Sent: Wednesday, December 26, 2012 11:58 AM
> To: [email protected]
> Subject: st: RE: RE: Easy Question? Counting cases based on a "target"
> case
> 
> Thanks David, but that does not answer the question I had intended to
ask
> (though your answer gives me additional insights)
> 
> Let me try to be clearer.  Assume you simplified the example to the
> following, and using your suggestion for coding:
> 
> sysuse auto, clear
> keep in 1/20
> g id=_n
> g pricek=int(price/1000) //I am simplifying the levels of price to the
> 1000's
> keep id price pricek //to clean out unwanted variables
> levelsof pricek, local(pricesk)
> foreach p of local pricesk {
> 	gen near`p' = inrange(pricek, `=`p'-2', `=`p'+2')
> 	}
> egen countneark = rowtotal(near*)
> drop near*
> tab pricek
> *==========================end
> 
> In the above example the correct totals can be calculated based on the
> tabulate output.  For the various levels of pricek I should have the
> following counts of "near" cases (assuming the individual case is
counted)
> 
> pricek		count of near cases
> 3		15
> 4		15
> 5		16
> 7		5
> 10		2
> 11		2
> 14		2
> 15		2
> 
> This is not what is generated by the countneark variable.
> 
> Further, in my real application I have over 170,000 values of the
variable
> that is being used as the criterion, and therefore it seems like it will
> be
> inefficient to develop all of the levels based on them.  Finally, I
should
> add, I had envisioned using more than one criteria based on more than
one
> variables, all relative to the respective case, with which to evaluate
the
> cases to be counted.  So, for example, I would use price +/- 2000 and
mpg
> +/- 3.
> 
> Any additional insight would be much appreciated.
> 
> Ben
> 
> Ben Hoen
> LBNL
> Office: 845-758-1896
> Cell: 718-812-7589
> 
> 
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of David Radwin
> Sent: Wednesday, December 26, 2012 1:25 PM
> To: [email protected]
> Subject: st: RE: Easy Question? Counting cases based on a "target" case
> 
> Ben,
> 
> I don't think you need to loop over observations, but you can loop over
> values which is fairly efficient. Something like this:
> 
> 
> levelsof price, local(prices)
> foreach p of local prices {
> 	gen near`p' = inrange(price, `=`p'-2000', `=`p'+2000')
> 	}
> egen countnear = rowtotal(near*)
> 
> 
> In the example above I use all prices, but you could substitute the
> following line for the first and second line above:
> 
> 	foreach p of numlist 1900 2500 4000 6500 10000 {
> 
> David
> --
> David Radwin
> Senior Research Associate
> MPR Associates, Inc.
> 2150 Shattuck Ave., Suite 800
> Berkeley, CA 94704
> Phone: 510-849-4942
> Fax: 510-849-0794
> 
> www.mprinc.com
> 
> 
> > -----Original Message-----
> > From: [email protected] [mailto:owner-
> > [email protected]] On Behalf Of Ben Hoen
> > Sent: Wednesday, December 26, 2012 10:06 AM
> > To: [email protected]
> > Subject: st: Easy Question? Counting cases based on a "target" case
> >
> > I want to perform a function that I think would be easy but I can't
wrap
> > my
> > head around how to perform it without looping through each case.
> >
> > I want to create a count of the number of records in the file that
meet
> a
> > certain criteria based on a respective case's value.  So for example
> using
> > the auto dataset:
> >
> > *====================begin
> > sysuse auto, clear
> > g id=_n
> > egen nearprice2000=count(id) if... //count the number of other cases
in
> > the
> > dataset if the price of the car is within $2000 of the price of this
> > cases'
> > (i.e., target) car's price
> >
> > *====================end
> >
> > The egen command is how I thought I would resolve this, but I can't
> figure
> > it out exactly.  The nearprice2000 would equal the count for each case
> of
> > the number of other cases in the dataset that have a price that is
> either
> > +/- $2000 from the particular case's price.  So if the full dataset
had
> > only
> > 5 prices: 1900, 2500, 4000, 6500, and 10000, their respective
> nearprice200
> > values would be: 2, 3, 2, 2, and 1 (if itself would be included in the
> > count) or 1, 2, 1, 1, and 0 (if itself would NOT be included in the
> count)
> >
> > I might be able to do this by looping through the cases, but I know
that
> > is
> > not encouraged by other more experienced users.
> >
> > Any advice would be greatly appreciated.
> >
> > Ben
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Easy Question? Counting cases based on a "target" case
  - From: "Ben Hoen" <[email protected]>
- st: RE: Easy Question? Counting cases based on a "target" case
  - From: "David Radwin" <[email protected]>
- st: RE: RE: Easy Question? Counting cases based on a "target" case
  - From: "Ben Hoen" <[email protected]>
- st: RE: RE: RE: Easy Question? Counting cases based on a "target" case
  - From: "David Radwin" <[email protected]>

Prev by Date: RE: st: RE: Creating / generating Industry level time dummy variables
Next by Date: Re: st: extracting of data
Previous by thread: st: RE: RE: RE: Easy Question? Counting cases based on a "target" case
Next by thread: st: Tobit graphics
Index(es):
- Date
- Thread