# Re: st: Treatment of missing values in surveys in Stata (subpop)

 From Ángel Rodríguez Laso To statalist@hsphsun2.harvard.edu Subject Re: st: Treatment of missing values in surveys in Stata (subpop) Date Sun, 10 Mar 2013 20:12:14 +0100

```Thank you very much, Steve. Very convincing explanation.

Angel Rodriguez-Laso

2013/3/9 Steve Samuels <sjsamuels@gmail.com>:
> Ángel:
>
> Rereading, I see that you asked about using the subpop() option when there are missing values.  Leaving a particular question unanswered could happen for many reasons, including fatigue, haste, interviewer error, and data entry mistakes. So again, the theory of the subpopulation correction does not apply.
>
> You didn't need to recode a missing numerical value to something like 999 in order to use it. Such 999 coding is used only for data forms these days.
>
> . svy, subpop( if var < .)
>
> would do the job. This takes care of extended missing values, like .a, since in Stata they order as: . , .a , .b ,..., .z
>
> Multiple imputation is the approach for handling missing values.
>
>
> Steve
> Ángel:
>
> The theory of subpopulation corrections does not apply to non-response.
>
> A subpopulation is a subset of the population tht can be defined in
> advance: (e.g. males, ages 30-40, living in rural areas). The number
> selected by a sample will be random. For example, suppose a population
> of N members contains a subpopulation of M members. A SRS of size n
> taken. You should be able work out the exact probability that the sample
> will contain exactly k members of the subpopulation. The theory of the
> subpopulation correction is an extension of this, and can be found in
> any good text.
>
> In contrast, "responder" is not a characteristic, like gender, that is
> known in advance. It is defined only in relation to the particular sample
> design and protocol. For identical designs, better protocols can
> increase response rates. Thus, sampling theory alone cannot
> describe the numbers of  responders and, consequently, the
> subpopulation correction is not applicable.
>
> Steve
>
> sjsamuels@gmail.com
>
> On Mar 8, 2013, at 2:37 PM, Ángel Rodríguez Laso wrote:
>
> Dear Statalisters,
>
> I have found two recommended procedures for dealing with individuals
> with missing items ('normal' missing answers like 'DK/DA' or equipment
> failure) when analysing surveys with Stata:
>
> 1) One is based on the recommendation that, unless there is a very
> strong reason to do otherwise, whenever you analyse a group of
> individuals in a survey with Stata, you have to use subpop. (See for
> example: http://www.stata.com/meeting/mexico10/mex10sug_canette.pdf).
> Under this perspective, those with valid values would be a
> subpopulation. From my point of view, this means that in order to
> prevent Stata from dropping them from the calculation of standard
> errors, missing codes (".") should be recoded to a numerical value
> (like 999) and then a command issued this way:
>
> svy, subpop(if var<999): command var
>
> 2) Nevertheless, most of the information I've read does not make any
> need to be recoded. I've even found this piece of advice
> (http://www.stata.com/statalist/archive/2012-09/msg01028.html): 'I've
> never seen a recommendation to consider observations with non-missing
> values as a subpopulation'
>
>
> I wonder if anyone could throw some ligth on this topic.
>
> Thank you very much.
>
> Angel Rodriguez-Laso
```