[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Steven Samuels <sjhsamuels@earthlink.net> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: ranking with weights |

Date |
Tue, 2 Dec 2008 16:16:59 -0500 |

--

-Steve On Dec 2, 2008, at 2:16 PM, Cindy Gao wrote:

Thanks for your reply.The observations (analytic units) are households. Expenditure isthe monthly expenditure of household. This is household surveydata. The weights are frequency weights, to weight the sample tothe whole country. The weights are likely to vary across forexample regions, to compensate for oversampling or undersampling.Basically I need to rank all households according to theirexpenditure, from lowest to highest. But, I must take account ofthe weightings. If for example there are 2 households with the sameexpenditure, they must be ranked the same and this rank must takeaccount of weightings. If there were no ties (households with sameexpenditure), I could achieve mission by generating a variable"rank", like -g rank=sum(weight)-. The problem comes because ofties. If i could -expand- my dataset using weights, then i couldsimply say -egen rank =rank(expenditure)- ; the problem is thatdataset is too large for this.

---- Original Message ---- From: Steven Samuels <sjhsamuels@earthlink.net>Cindy, What are the analytic units (people? regions?). What arethe "weights"? What is "expenditure"? How is it measured. What doyou mean that some regions are "less sampled" than others. It'snot clear, for example, if this is a sample, and if so, of what?So, please describe the study design in detail. Last question:what is the purpose of the ranking?-On Dec 2, 2008, at 12:54 PM, Cindy Gao wrote:I am trying to find a way to rank weighted data (since the egenfunction -rank- does not work with weights). A simple way would beorder the data in terms of variable that I have interest in(monthly expenditure) and then create a new variable like -grank1=sum(weight)-. But, there is problem. Some of my observationsare "tied" as they have the same level of expenditure. Using thesimple method I mention means that some observations are rankedabove others even though they have same level of expenditure. Thisis a problem as the weights are large so you find that 2observations are ranked with bug gap in between even though samelevel of expenditure. It is even bigger problem because theweights might be correlated with some other variables I aminterested in (like region, since some regions are less sampledthan other). I also try multiplying the expenditure ranking by theweight, but this gives wrong results (for example they do not addup to weightedtotal). Can anyone help? In other words, I would like for allobservations with same expenditure to have same rank, which Iassume would be some average of all the weighted observationshaving that same expenditure.. I include a sample dataset below:expenditure weighting rank rank1weighted_rank10 341 1341 34112 1065 2.5 1406 ??? 12 98 2.5 1504 15 254 4 1758 .

* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: ranking with weights***From:*Cindy Gao <cindy.gao@ymail.com>

**Re: st: ranking with weights***From:*Steven Samuels <sjhsamuels@earthlink.net>

**Re: st: ranking with weights***From:*Cindy Gao <cindy.gao@ymail.com>

- Prev by Date:
**Re: st: Length for strings, ignoring SMCL tags** - Next by Date:
**RE: st: Length for strings, ignoring SMCL tags** - Previous by thread:
**Re: st: ranking with weights** - Next by thread:
**st: Proper usage of Macros stored in summarize** - Index(es):

© Copyright 1996–2017 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |