From Dirk Enzmann To statalist@hsphsun2.harvard.edu Subject Re: st: sample adjustment by substitution instead of weighting Date Sat, 25 Apr 2009 14:46:24 +0200

```
Dirk

http://www.hsph.harvard.edu/cgi-bin/lwgate/STATALIST/archives/statalist.0904/date/article-1134.html
here because at the moment he can't send mails to the list:

---------------------------------------------------------------------
Dirk-
```
I've never heard of this procedure. There is some basis for thinning a sample randomly to meet sampling goals, and substitutions for missing observations are also practiced, but you are not describing either of these.
```
```
The process of exclusion and duplication will destroy the ability of the sample to estimate anything but the characteristics that are being matched--but those are already known! For instance, the sample cannot estimate without bias the means of other variates. For the matched characteristics, the sample will not permit estimation of SD's or quantiles. Moreover, no standard errors or confidence intervals can be computed for anything, because the exclusions and duplication have artificially reduced the variability in the sample.
```
```
To better match the sample estimates to known population characteristics, I know of only three procedures: 1) post-stratification ; 2) sample raking, which is an extension; and 3) generalized regression estimation (GREG).
```
```
The exclusions and duplication are naive attempts to re-weight the sample. However they completely destroy it. So, no this is not actual practice. The only discussion of something similar I've read is in Lohr (1999, Sampling: Design and Analysis, Duxbury, p 463) gives the reference to Neyman J. 1934. On the two different methods of the representative method: The method of stratified sampling and the method of purposive selection. J. Royal Statistical Society 197: 558-606. Here is the quote from her book:
```
```
"Neyman's paper pretty much finished off the idea that results from purposive samples could be generalized to the population. He presented an example of the purposive sample taken by Gini and Galvani in the late 1920's. Gini and Galvani chose 29 districts that gave the averages of all 214 districts in the 1921 Italian census, on a dozen variables. But Neyman showed that all statistics other than the average values of the controls showed a violent contrast between the sample and the whole population."
```
```
Of course, Gini and Galvani only excluded, but did not duplicate, they only excluded. So the procedure has long been discredited.
```
-Steve
---------------------------------------------------------------------

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```