[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Steven Samuels <sjhsamuels@earthlink.net> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Confidence interval for a median with weighted, clustered data |

Date |
Tue, 23 Dec 2008 18:47:01 -0500 |

-Steve On Dec 23, 2008, at 12:49 PM, Steven Samuels wrote:

In correspondence, Roger Newson (r.newson@imperial.ac.uk) describedto me a method for computing a CI for a median with weighted,clustered, data. With his permission, I am posting his descriptionand the do-file which accompanied it. (I have slightly edited bothto delete some extraneous material.) The do file utilizes Roger'sprogram -cendif- (part of -somersd- available from SSC) and hiscommands -keyby- ,-xcontract-, and -expgen- .-Steve From Roger Newson:I would use the following method to make cendif calculateconfidence intervals for a median (possibly with clusters and/orweights). This method involves the concept of between-scenario rankstatistics, exemplified by the Gini coefficient and the populationattributable risk. (See Subsection 5.3 of Newson (2006).)I would use my own expgen package, downloadable from SSC, toduplicate every observation in the sample, replacing eachobservation with a pair of identical observations, and generating acopy identifier variable named scenario, equal to 1 for the firstmember of a pair and equal to 2 for the second member of a pair. Iwould then set the value of the variable whose median I wanted toknow to zero in all the second members of all pairs (wherescenario==2). I would then use cendif, with the optionsby(scenario) funtype(vonmises)and with a cluster() option putting both members of each pair inthe same cluster, to calculate a median difference between thesubsamples with scenario==1 and scenario==2. This should definitelyproduce a valid confidence interval, with or without weights, forthe median difference between individuals sampled from the world aswe know it (Scenario 1) and a hypothetical fantasy world in whichthe variable whose median we are estimating is always zero(Scenario 2). This median difference is (of course) simply themedian of the variable whose median we wanted to know.To illustrate my method I have run an example do-file, appended. I have set options transf(iden) tdistwhich, based on my own simulations so far, are usually a good ideawhen estimating median differences. I also set optionsby(scenario) funtype(vonmises)and the appropriate cluster options. To produce an "unclustered"estimate, I have used the option cluster(make), implying 1 clusterfor each car model. (Therefore, each observation in the old autodata corresponds to a cluster of 2 observations in the duplicateddataset.) To produce a "clustered" estimate, I have used the optioncluster(firm), where firm is a derived variable, containing thefirst word of make. This cluster variable implies that we aresampling firms from a population of firms, instead of sampling carmodels from a population of car models. For both the clustered andthe unclustered case, I have produced an unweighted estimate, andalso a weighted estimate, using pweights to weight non-Americancars higher, and therefore producing a higher median mpg.References Newson R. Confidence intervals for rank statistics: Somers' D and extensions. The Stata Journal 2006; 6(3): 309-334. Download pre-publication draft from http://www.imperial.ac.uk/nhli/r.newson/papers.htm **************************CODE BEGINS************************** #delim ; version 10.1; * Demo of confidence interval for median using cendif.This program uses the SSC packages somersd, keyby, xcontract, andexpgen.*; clear; set memory 1m; sysuse auto, clear; keyby foreign make; gene firm=word(make,1); lab var firm "Firm"; gene pwt=foreign + 0.25*!foreign; lab var pwt "Probability weight"; desc; xcontract foreign pwt, list(,); xcontract firm, list(,); preserve; expgen =2, copy(scenario); replace mpg=0 if scenario==2; * Unclustered analyses *;cendif mpg, by(scenario) tdist transf(iden) funtype(vonmises)cluster(make);cendif mpg [pwei=pwt], by(scenario) tdist transf(iden) funtype(vonmises) cluster(make);* Analyses clustered by firm *;cendif mpg, by(scenario) tdist transf(iden) funtype(vonmises)cluster(firm);cendif mpg [pwei=pwt], by(scenario) tdist transf(iden) funtype(vonmises) cluster(firm);restore; exit; ***************************CODE ENDS***************************; * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Confidence interval for a median with weighted, clustered data***From:*Steven Samuels <sjhsamuels@earthlink.net>

- Prev by Date:
**Re: st: Simulating multilevel data in Stata** - Next by Date:
**st: colors in smcl** - Previous by thread:
**st: Confidence interval for a median with weighted, clustered data** - Next by thread:
**st: colors in smcl** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |