Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Using Multiple Imputation in a very large dataset


From   Raquel Rangel de Meireles Guimarães <raquelrguimadem@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   st: Using Multiple Imputation in a very large dataset
Date   Fri, 03 Jun 2011 12:36:31 -0300

Dear stata users,

I am using Stata MP Dual Core 64-bits on windows 7. I have 4GB RAM, but
I've allocated 5GB to store my data.

I am interested in modeling the determinants of school performance. I
have data for 1.939.147 students. My dependent variable is the reading
proficiency (fully Observed), and I have the student's individual
characteristics (gender, race and age - fully Observed) and the scores
for the socioeconomics constructs (socioeconomic level, student
motivation, parents Involvement, Cultural Capital), which were obtained
via Item Response Theory.

I would like to impute values ​​for the socioeconomic characteristics
according to levels of student's proficiency, gender, race and age.

My data can be found at the following website:

http://sites.google.com/site/raquelscurriculumsite/data/data-school-achievement.rar

I would like to impute values since I will lost a lot of students in my
study doing regressions.

Here is a descriptive statistics of my fully observed variables X:

Variable | Obs Mean Std. Dev. Min Max
-------------+-----------------------------------------------------------------------
cod_uf | 1939147 32.73305 9.495694 11 53
região | 1939147 2.898205 1.031377 1 5
qn1 (sex) | 1939147 1.499852 .5000001 1 2
qn2 (race) | 1939147 1.941047 .9668508 1 5
qn4 (age groups) | 1939147 3.752611 1.195527 1 8
-------------+-------------------------------------------------------------------------
leitura (reading proficiency) | 1939147 175.8849 41.19135 0 347.36

Here is the misstable of my missing values:

. misstable sum capitalcultural envolvimento motivacao nse
Obs<.
+------------------------------
| | Unique
Variable | Obs=. Obs>. Obs<. | values Min Max
-------------+--------------------------------+------------------------------
capitalcultural | 42,986 1896161 | 371 -1.662 1.662
envolvimento | 20,302 1918845 | 19 -1.178 1.178
motivacao | 37,507 1901640 | 15 -1.014 .672
nse | 6,092 1933055 | >500 -2.02 2.02
-----------------------------------------------------------------------------

Here is my procedure to do multiple imputation:

mi set mlong
mi register imputed capitalcultural envolvimento motivacao nse
mi register regular leitura qn1 qn2 qn4
tab qn1, g(sexo)
tab qn2, g(raca)
tab região, g(regiao)
xi: mi impute reg capitalcultural = leitura sexo1 raca1 regiao1 qn4,
add(1000)

I got the following message error: insufficient disk space r(699)

Could anyone please help me? Is there a possibility of another
imputation technique? Hotdeck would not be useful since the imputed
variables are not categorized.

Kind regards,

Raquel

--
Raquel Rangel de Meireles Guimarães
Professora Substituta do Departamento de Demografia, UFMG
Doutoranda em Demografia
http://ufmg.academia.edu/RaquelGuimaraes
Cedeplar - Centro de Desenvolvimento e Planejamento Regional

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index