`Thank you very much Steve for your elaborate answer - it is very
``helpful, indeed!
`
Dirk
On behalf of Steve I include his answer in reply to
http://www.hsph.harvard.edu/cgi-bin/lwgate/STATALIST/archives/statalist.0904/date/article-1134.html
here because at the moment he can't send mails to the list:
---------------------------------------------------------------------
Dirk-

`I've never heard of this procedure. There is some basis for thinning a
``sample randomly to meet sampling goals, and substitutions for missing
``observations are also practiced, but you are not describing either of these.
`

`The process of exclusion and duplication will destroy the ability of the
``sample to estimate anything but the characteristics that are being
``matched--but those are already known! For instance, the sample cannot
``estimate without bias the means of other variates. For the matched
``characteristics, the sample will not permit estimation of SD's or
``quantiles. Moreover, no standard errors or confidence intervals can be
``computed for anything, because the exclusions and duplication have
``artificially reduced the variability in the sample.
`

`To better match the sample estimates to known population
``characteristics, I know of only three procedures: 1) post-stratification
``; 2) sample raking, which is an extension; and 3) generalized regression
``estimation (GREG).
`

`The exclusions and duplication are naive attempts to re-weight the
``sample. However they completely destroy it. So, no this is not actual
``practice. The only discussion of something similar I've read is in Lohr
``(1999, Sampling: Design and Analysis, Duxbury, p 463) gives the
``reference to Neyman J. 1934. On the two different methods of the
``representative method: The method of stratified sampling and the method
``of purposive selection. J. Royal Statistical Society 197: 558-606. Here
``is the quote from her book:
`

`"Neyman's paper pretty much finished off the idea that results from
``purposive samples could be generalized to the population. He presented
``an example of the purposive sample taken by Gini and Galvani in the late
``1920's. Gini and Galvani chose 29 districts that gave the averages of
``all 214 districts in the 1921 Italian census, on a dozen variables. But
``Neyman showed that all statistics other than the average values of the
``controls showed a violent contrast between the sample and the whole
``population."
`

`Of course, Gini and Galvani only excluded, but did not duplicate, they
``only excluded. So the procedure has long been discredited.
`
-Steve
---------------------------------------------------------------------
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/