It is not clear what Svend thinks is going on here, but for anyone thinking of using this strategy, I recommend reading Jones, MP (1996), "Indicator and Stratificatio Methods for missing explanatory variables in multiple linear regression," _Journal of the American Statistical Association_, 91: 222-230
Rich

Svend Juul wrote:
```Cy wrote:
In a previous post, I indicated there was a drastic reduction in my
```sub-population size. I traced the problem to a variable with a lot of
missing cases.
As you can see from the table below, this variable elicits whether the
```respondent engaged in unprotected sexual intercourse. About a third of
the cases (33.78%) are missing.
V761 -- Last intercourse used condom
```-----------------------------------------------------------
|      Freq.    Percent      Valid       Cum.
---------------+--------------------------------------------
Valid   0 No   |       6012      56.16      84.81      84.81
1 Yes  |       1075      10.04      15.16      99.97
9      |          2       0.02       0.03     100.00
Total  |       7089      66.22     100.00
Missing .      |       3617      33.78
Total          |      10706     100.00
-----------------------------------------------------------
Since the dependent variable in my deals with HIV risk, I need to
```include sexual risk variables such as the V761 in the model.  How do I
deal with this missing data problem, so that it does not affect my
sample size. Would an imputation work?
========================================================== In this case, I would avoid imputation and instead generate two dummy variables:
```   V761_0 = 1 if no condom use, otherwise 0
V761_miss = 1 if missing or 9, otherwise 0
. generate V761_0 = V761==0
```    . generate V761_miss = V761>1
. groups V761* , missing
+--------------------------------------------+
| V761   V761_0   V761_m~s   Freq.   Percent |
|--------------------------------------------|
|    0        1          0    6012     56.16 |
|    1        0          0    1075     10.04 |
|    9        0          1       2      0.02 |
|    .        0          1    3617     33.78 |
+--------------------------------------------+
-groups- is an unofficial command (ssc install groups). Both variables should be included in your regression. You will still
```have a problem interpreting what missing means, but that problem
can not be solved by imputation.
Hope this helps
```Svend
