Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Missing data: about hotdeck


From   Jia Xiangping <[email protected]>
To   statalist <[email protected]>
Subject   st: Missing data: about hotdeck
Date   Wed, 8 Feb 2006 00:57:08 +0800

Dear Statalist

I am going to speak my questions in following two parts:
1) Firstly, briefly introduce my problems;
2) The command I used and what I was informed by Stata (version 8.2)

1)There are some observations of one variable in my database are
missing, nearly 10%. I've got to say, the missing is kinda due to the
questionnaire and I doubt it is random missing. So the deterministic
methods is quite limited in this case.

I tried to use regression imputation by replacing missing values with
the predicted values from a regression of the missing item on the
variables related to the missing, most of which are categorical
variables.Though it works, I doubt the result. And I want to compare
it with the result of -hotdeck-, because the option -by- of -hotdeck-
can specify categorical variables defining strata within which the
imputation is to be carried out.

2) The following are the results:(limit_formal is the variable with missing)
 hotdeck limit_formal, by(ifration iformal) store
  The result is:
------------------------------------------------------
DELETING all matrices....

Table of the Missing data patterns
 * signifies missing and - is not missing

Varlist order: limit_formal

    pattern |      Freq.     Percent        Cum.
------------+-----------------------------------
          * |         49       14.54       14.54
          - |        288       85.46      100.00
------------+-----------------------------------
      Total |        337      100.00
  333
WARNING: When the <command> option is not selected
then no analysis is performed on the imputed datasets
-------------------------------------------------------

Then I want to run a regression on other information, but it doesn't work.
. hotdeck limit_formal, by(ifration iformal) store command(reg
limit_formal ifration iformal lginc_total popu) impute(2)
parms(ifration iformal lginc_total popu)
Then I was told:
----------------------------------------------------------
DELETING all matrices....

Table of the Missing data patterns
 * signifies missing and - is not missing

Varlist order: limit_formal

    pattern |      Freq.     Percent        Cum.
------------+-----------------------------------
          * |         49       14.54       14.54
          - |        288       85.46      100.00
------------+-----------------------------------
      Total |        337      100.00
  333
variable lginc_total not found
-------------------------------------------------------------

Actually, the variable lginc_total is no problem. Then I switch to
another way, it works.
. hotdeck limit_formal, by(ifration iformal) store command(logit
ifration limit_formal) impute(2) parms(limit_formal)
-------------------------------------------------------------

Table of the Missing data patterns
 * signifies missing and - is not missing

Varlist order: limit_formal

    pattern |      Freq.     Percent        Cum.
------------+-----------------------------------
          * |         49       14.54       14.54
          - |        288       85.46      100.00
------------+-----------------------------------
      Total |        337      100.00
  333
WARNING: t less than 4 invalid global test increase
parameters OR imputations

                   Number of Obs.               =  333
                   No. of Imputations           =  2
                   % Lines of Missing Data      =  13.513514 %
                   F(  4.000 ,1)                =     2.7409
                   Prob > F                     =     0.1732
-------------------------------------------------------------------------------
Variable |  Average   Between   Within     Total      df        t      p-value
         |  Coef.     Imp. SE   Imp. SE     SE
---------+---------------------------------------------------------------------
limit_formal | -0.0000     0.000     0.000     0.000 8196554.1   
-1.655     0.098
---------+---------------------------------------------------------------------
Variable |  [95% Conf. Interval]
---------+---------------------------------------------------------------------
limit_formal |   -0.0001    0.0000
-------------------------------------------------------------------------------
---------------------------------------------------------------------

Now, it works. But the dataset generated can not be merged into my
main database because there is no index or key variable that can
identify each observation. What's more, what's wrong with the
regression of -command-? Am I wrong? Or are there other alternatives I
can use to deal with the missing value?

Hereinabove are my problems. Frankly, I am "green hand" to missing
value. So any suggestion and comment are highly appreciated.

--
Xiangping JIA

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index