[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: ranking with weights
Steven Samuels <firstname.lastname@example.org>
Re: st: ranking with weights
Tue, 2 Dec 2008 16:16:59 -0500
Cindy-- The weights are not likely to be frequency weights (fweights)
--they are probability weights (pweights), possibly post-stratified.
If they are whole numbers than someone has rounded them. You till
haven't answered the question: why do you want to rank the
households? Quantities calculated in samples are estimates of
population quantities. What population quantities are you trying to
estimate with the ranks? If you are trying to estimate percentiles,
the -pctile- command will take pweights.
On Dec 2, 2008, at 2:16 PM, Cindy Gao wrote:
Thanks for your reply.
The observations (analytic units) are households. Expenditure is
the monthly expenditure of household. This is household survey
data. The weights are frequency weights, to weight the sample to
the whole country. The weights are likely to vary across for
example regions, to compensate for oversampling or undersampling.
---- Original Message ----
From: Steven Samuels <email@example.com>
Cindy, What are the analytic units (people? regions?). What are
the "weights"? What is "expenditure"? How is it measured. What do
you mean that some regions are "less sampled" than others. It's
not clear, for example, if this is a sample, and if so, of what?
So, please describe the study design in detail. Last question:
what is the purpose of the ranking?
-On Dec 2, 2008, at 12:54 PM, Cindy Gao wrote:
I am trying to find a way to rank weighted data (since the egen
function -rank- does not work with weights). A simple way would be
order the data in terms of variable that I have interest in
(monthly expenditure) and then create a new variable like -g
rank1=sum(weight)-. But, there is problem. Some of my observations
are "tied" as they have the same level of expenditure. Using the
simple method I mention means that some observations are ranked
above others even though they have same level of expenditure. This
is a problem as the weights are large so you find that 2
observations are ranked with bug gap in between even though same
level of expenditure. It is even bigger problem because the
weights might be correlated with some other variables I am
interested in (like region, since some regions are less sampled
than other). I also try multiplying the expenditure ranking by the
weight, but this gives wrong results (for example they do not add
up to weighted
total). Can anyone help? In other words, I would like for all
observations with same expenditure to have same rank, which I
assume would be some average of all the weighted observations
having that same expenditure.. I include a sample dataset below:
12 1065 2.5 1406 ???
12 98 2.5 1504
15 254 4 1758
* For searches and help try: