[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Confidence interval for a median with weighted, clustered data

From	Steven Samuels <[email protected]>
To	[email protected]
Subject	st: Confidence interval for a median with weighted, clustered data
Date	Tue, 23 Dec 2008 12:49:28 -0500

In correspondence, Roger Newson ([email protected]) describedto me a method for computing a CI for a median with weighted,clustered, data. With his permission, I am posting his descriptionand the do-file which accompanied it. (I have slightly edited both todelete some extraneous material.) The do file utilizes Roger'sprogram -cendif- (part of -somersd- available from SSC) and hiscommands -keyby- ,-xcontract-, and -expgen- .


-Steve


From Roger Newson:

I would use the following method to make cendif calculate confidenceintervals for a median (possibly with clusters and/or weights). Thismethod involves the concept of between-scenario rank statistics,exemplified by the Gini coefficient and the population attributablerisk. (See Subsection 5.3 of Newson (2006).)

I would use my own expgen package, downloadable from SSC, toduplicate every observation in the sample, replacing each observationwith a pair of identical observations, and generating a copyidentifier variable named scenario, equal to 1 for the first memberof a pair and equal to 2 for the second member of a pair. I wouldthen set the value of the variable whose median I wanted to know tozero in all the second members of all pairs (where scenario==2). Iwould then use cendif, with the options


by(scenario) funtype(vonmises)

and with a cluster() option putting both members of each pair in thesame cluster, to calculate a median difference between the subsampleswith scenario==1 and scenario==2. This should definitely produce avalid confidence interval, with or without weights, for the mediandifference between individuals sampled from the world as we know it(Scenario 1) and a hypothetical fantasy world in which the variablewhose median we are estimating is always zero (Scenario 2). Thismedian difference is (of course) simply the median of the variablewhose median we wanted to know.


To illustrate my method I have run an example do-file, appended.

I have set options

transf(iden) tdist

which, based on my own simulations so far, are usually a good ideawhen estimating median differences. I also set options


by(scenario) funtype(vonmises)

and the appropriate cluster options. To produce an "unclustered"estimate, I have used the option cluster(make), implying 1 clusterfor each car model. (Therefore, each observation in the old auto datacorresponds to a cluster of 2 observations in the duplicateddataset.) To produce a "clustered" estimate, I have used the optioncluster(firm), where firm is a derived variable, containing the firstword of make. This cluster variable implies that we are samplingfirms from a population of firms, instead of sampling car models froma population of car models. For both the clustered and theunclustered case, I have produced an unweighted estimate, and also aweighted estimate, using pweights to weight non-American cars higher,and therefore producing a higher median mpg.


References

Newson R. Confidence intervals for rank statistics: Somers' D and
extensions. The Stata Journal 2006; 6(3): 309-334. Download
pre-publication draft from
http://www.imperial.ac.uk/nhli/r.newson/papers.htm


**************************CODE BEGINS**************************
#delim ;
version 10.1;
*
 Demo of confidence interval for median using cendif.

This program uses the SSC packages somersd, keyby, xcontract, andexpgen.

*;

clear;
set memory 1m;

sysuse auto, clear;
keyby foreign make;
gene firm=word(make,1);
lab var firm "Firm";
gene pwt=foreign + 0.25*!foreign;
lab var pwt "Probability weight";
desc;
xcontract foreign pwt, list(,);
xcontract firm, list(,);

preserve;
expgen =2, copy(scenario);
replace mpg=0 if scenario==2;
*
 Unclustered analyses
*;

cendif mpg, by(scenario) tdist transf(iden) funtype(vonmises) cluster(make);cendif mpg [pwei=pwt], by(scenario) tdist transf(iden) funtype(vonmises) cluster(make);

*
 Analyses clustered by firm
*;

cendif mpg, by(scenario) tdist transf(iden) funtype(vonmises) cluster(firm);cendif mpg [pwei=pwt], by(scenario) tdist transf(iden) funtype(vonmises) cluster(firm);

restore;

exit;
***************************CODE ENDS***************************;




*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: Confidence interval for a median with weighted, clustered data
  - From: Steven Samuels <[email protected]>

Prev by Date: st: RE: Simulating multilevel data in Stata
Next by Date: Re: st: RE: Simulating multilevel data in Stata
Previous by thread: st: Simulating multilevel data in Stata
Next by thread: Re: st: Confidence interval for a median with weighted, clustered data
Index(es):
- Date
- Thread