[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: statalist-digest V4 #965

From	Roger Newson <[email protected]>
To	[email protected]
Subject	Re: st: RE: statalist-digest V4 #965
Date	Sun, 04 Aug 2002 20:24:52 +0100

At 11:23 04/08/02 -0400, Stephen Soldz wrote:

thanks to Nick Cox and Roger Newson for their responses to my question about
robust tests of dependent proportions.  Nick gave several references I'll
look up.  Roger thinks I wouldn't do to bad with paired t-tests as they are:
"a special case of the Huber variance for clustered data (where the clusters
are the pairs of responses and the  observations are the individual
responses)".  I wonder if you have a refernce for this I could cite?

I don't think there is any need for a reference, as the point is so trivial. If you are estimating the difference between 2 population proportions from 2 different sample proportions on the same sample, then you are estimating the mean of Z=X-Y, where X and Y are Bernoulli variables. You are therefore simply estimating the population mean Z from the sample mean Z. The large sample theory applies, courtesy of the central limit theorem for ordinary sample means, whether Z is normal (as with the usual 2-sample t-test) or a discrete distribution with possible values -1, 0 and 1 (as here).

The bit about clustered Huber variances is probably not strictly necessary, but is justified as follows. The conventional SE of the sample mean happens also to be the Huber SE for estimating the population mean, if you are using any likelihood function which uses the sample mean as the maximum-likelihood estimator for the population mean (which includes the normal likelihood function, and includes also the discrete-distribution likelihood function with possible values -1, 0 and 1). This is because the Huber variance is, by definition, the sample mean square of the sample influence function divided by the number of sampling units. The sample influence function of the mean, for the i'th sampling unit, is Z_i-Zbar, where Z_i as the i'th Z-value and Zbar is the sample mean Z-value. A good reference on influence functions in general is Hampel (1974).

I hope this helps.

Best wishes

Roger

References

Hampel FR. The influence curve and its role in robust estimation. Journal of the American Statistical Association 1974; 69: 383-397.

--
Roger Newson
Lecturer in Medical Statistics
Department of Public Health Sciences
King's College London
5th Floor, Capital House
42 Weston Street
London SE1 3QD
United Kingdom

Tel: 020 7848 6648 International +44 20 7848 6648
Fax: 020 7848 6620 International +44 20 7848 6620
or 020 7848 6605 International +44 20 7848 6605
Email: [email protected]

Opinions expressed are those of the author, not the institution.

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- st: Ecological monitoring
  - From: "Graham M Smith" <[email protected]>

References:
- st: RE: statalist-digest V4 #965
  - From: "Stephen Soldz" <[email protected]>

Prev by Date: st: Date: Sun, 4 Aug 2002 12:50:48 -0500
Next by Date: st: how to control the output lenth?
Previous by thread: st: RE: statalist-digest V4 #965
Next by thread: st: Ecological monitoring
Index(es):
- Date
- Thread