Victor Ernesto Marquez Perez Lelly María Useche Castro Dulce María Mesa Avila Ana Ides Chacon Contreras


An imputation design is presented to combine classification and imputation in order to improve the quality of imputed datum. Imputation is done with completely randomized missing quantitative data and using regression trees. Media imputation techniques is compared, theoretical and empirically, using regression trees, in order to develop an integral classification and imputation strategy.Unbiased estimators were obtained developing the expected value of the estimator. Estimator’s proprieties were evaluated trough their variance and bias development, which showed non bias. as for the unbiased estimator variance of the media, sufficiency was not proved for the media estimator.


Palabras Clave

missing data, imputation, CART, regression trees, unbiased estimators, simulation

Bárcena, M. J. & Tusell, F. (1999), ‘Enlace de encuestas: una propuesta metodológica y aplicación a la encuesta de presupuestos de tiempo’.
Borgoni, R. & Berrington, A. (1990), ‘A sequential tree-based procedure for multivariate imputation of complex missing data structure’, Journal of the American Statistical Association 85(410), 376–386.
BOX, G. E. P. (1949), ‘A general distribution theory for a class of likelihood criteria’, Biometrika 36.
Breiman, L., Freidman, J., Olshen, R. & Stone, C. (1984), Classification and Regression Tree, 1 edn, Wadsworth.
Buck, S. F. (1960), ‘A method of estimation of missing values in multivariate data suitable for use with an electronic computer’, Journal of the Royal Statistical Society. Series B (Methodological) pp. 302–306.
Dempster, A. P., Laird, N. M. & Rubin, D. B. (1977), ‘Maximum likelihood from incomplete data via the em algorithm’, Journal of the royal statistical society. Series B (methodological) pp. 1–38.
Goicoechea, A. P. (2002), ‘Imputaci´on basada en ´arboles de clasificaci´on’, Eustat. Available in: http://www.eustat.es/documentos/datos/ct 4.
Hansen, M., Hurwits, W. & Madow, W. (1953), Sample survey Methods and Theory, 1 edn, Wiley & Sons.
Krzanowski, W. (1988), ‘Missing value imputation in multivariate data using the singular value decomposition of a matrix’, Biometrical letters 25(1-2), 31–39.
Lee, J., Chang, T. & Krishnaiah, P. (1975), ‘Approximations to the Distributions of the likelihood Ratio Statistics for testing certain structures on the Covariance Matrices of Real Multivariate Normal Populations’, in Multivariate Analysis pp. 105–118.
Little, R. J. & Rubin, D. B. (2014), Statistical analysis with missing data, John Wiley & Sons. Lohr, S. (2009), Sampling: design and analysis, Nelson Education.
López, T. (2001), Estudio de técnicas de análisis de datos para selección de variables, detección de valores atípicos y estimación de valores faltantes en entradas al sistema NEUROMASTER.
Mesa, D. (2004), ‘Imputaci´on y ´arboles de decisi´on’, Gu´ıa pr´actica. Postgrado en Estadística, Universidad Central de Venezuela, Venezuela .
Piela, P., Laaksonen, S. & Finland, S. (2001), Automatic interaction detection for imputation œ tests with the waid software package, in ‘Contributed Paper for the Federal Committee on Statistical Methodology Research Conference, Washington, DC Area’, Citeseer.
Rencher, A. C. (2002), Methods of multivariate analysis, Wiley series in probability and mathematical statistics, 2nd ed edn, J. Wiley.
Schafer, J. L. (1997), Analysis of incomplete multivariate data, CRC press.
Service, G. S. (1996), Report of the task force on imputation, in ‘GSS Methodology Serie’.
Useche, L. & Mesa, D. (2006), ‘Una introducción a la imputación de valores perdidos’, Revista Terra 22(31), 127–151.
Cómo citar
Marquez Perez, V. E., Useche Castro, L. M., Mesa Avila, D. M., & Chacon Contreras, A. I. (2017). Imputation strategy with media using regression trees. Comunicaciones En Estadística, 10(1), 9-40. https://doi.org/10.15332/s2027-3355.2017.0001.01