Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Counting duplicates and assigning unique values


From   "Sebastian F. Büchte" <sfbuechte@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Counting duplicates and assigning unique values
Date   Sat, 22 Sep 2007 00:16:17 +0200

First, counting patients admitted to a hospital - assuming that the
combination of "hospid" and "key" uniquely identifies every
observation in your dataset, which would be the case if each entry in
"key" appears only once within a "hospid" group:

bys hospid: gen npat = _N

Second, creating a unique identifier for hospital assigned to each
hospital in ascending order based on "npat"

bys hospid: gen hvolid = 1 if _n==1
sort npat
replace hvolid = sum(hvolid)
bys hospid (hvolid): replace hvolid = hvolid[1]

Now you can plot npat over hvolid, however it would be wise and more
efficient to
reduce the number of observations used for the plot, since you do not
need all 30,000+ observations. You only need one observation per entry
in "hospid" or "hvolid" respectively.

bys hvolid: replace fobs = _n==1
scatter npat hvolid if fobs

One last comment, your request is a little bit "under" defined :) I
might be potentially not contributing anything to your problem because
I misinterpreted what you wrote. Sometimes it can be very helpful if
you not only post a question but also the Stata code which - naturally
in this case - did not get you where you thought it should.

Regards
Sebastian

2007/9/21, kdpapay@ucalgary.ca <kdpapay@ucalgary.ca>:
> I would like to count and plot the number of patients addmitted to each
> hospital, sorted by volume group.  Patients (variable name key, n=31818
> observations) are assigned a numbered ID ranging from 20000-40000.
> Hospitals (variable name hospid, n=2000) are also assigned a numbered ID
> ranging from 2000-5000.  Each hospid likely has many key observations
> associated with it.  But when I plot it the program plots it based on the
> numbered ID, instead of a unique values ie. 1-2000.  Any thoughts on what
> commands to use and how to plot this?
>
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index