Imputation strategy with media using regression trees

Victor Ernesto Marquez Perez; Lelly María Useche Castro; Dulce María Mesa Avila; Ana Ides Chacon Contreras

doi:10.15332/s2027-3355.2017.0001.01

Publicado

2017-05-16

Imputation strategy with media using regression trees

DOI: https://doi.org/10.15332/s2027-3355.2017.0001.01

Victor Ernesto Marquez Perez

Escuela Superior Politécnica de Chimborazo

Lelly María Useche Castro

Universidad Nacional Experimental Sur del Lago

Dulce María Mesa Avila

Universidad Central de Venezuela

Ana Ides Chacon Contreras

Universidad de los Andes

PDF BibText

Resumen (es)

An imputation design is presented to combine classification and imputation in order to improve the quality of imputed datum. Imputation is done with completely randomized missing quantitative data and using regression trees. Media imputation techniques is compared, theoretical and empirically, using regression trees, in order to develop an integral classification and imputation strategy.

Unbiased estimators were obtained developing the expected value of the estimator. Estimator’s proprieties were evaluated trough their variance and bias development, which showed non bias. as for the unbiased estimator variance of the media, sufficiency was not proved for the media estimator.

Palabras clave (es): missing data, imputation, CART, regression trees, unbiased estimators, simulation

Resumen (en)

An imputation design is presented to combine classication and imputation in order to improve the quality of imputed datum. Imputation is done with completely randomized missing quantitative data and using regression trees. Media imputation techniques is compared, theoretical and empirically, using regression trees, in order to develop an integral classication and imputation strategy.

Unbiased estimators were obtained developing the expected value of the estimator. Estimators proprieties were evaluated trough their variance and bias development, which showed non bias. as for the unbiased estimator variance of the media, suficiency was not proved for the media estimator.

Palabras clave (en): Missing data, imputation, CART, regression trees, unbiased estimators, simulation.

Victor Ernesto Marquez Perez, Escuela Superior Politécnica de Chimborazo

Ing. en Sistemas

Magister en Estadística Aplicada

Doctor en Estadística

Prof Agregado de la escuela de Estadística Universidad de los Andes.

Actualmente como prof. Invitado en la Escuela politécnica de Chimborazo (Ecuador)

Lelly María Useche Castro, Universidad Nacional Experimental Sur del Lago

Ing Industrial

Doctora en Estadistica

Profesora titular de la Universidad Nacional Experimental Sur del Lago

Dulce María Mesa Avila, Universidad Central de Venezuela

Ph. D. Social Statistics University of Southampton, U. K. Licenciado en Ciencias Estadísticas

Profesora de la Universidad Central De Venezuela

Ana Ides Chacon Contreras, Universidad de los Andes

Lic en Administración

Especialista en Estadistica

Prof. Asistente de la Universidad de los Andes

Referencias

Bárcena, M. J. & Tusell, F. (1999), ‘Enlace de encuestas: una propuesta metodológica y aplicación a la encuesta de presupuestos de tiempo’.

Borgoni, R. & Berrington, A. (1990), ‘A sequential tree-based procedure for multivariate imputation of complex missing data structure’, Journal of the American Statistical Association 85(410), 376–386.

BOX, G. E. P. (1949), ‘A general distribution theory for a class of likelihood criteria’, Biometrika 36.

Breiman, L., Freidman, J., Olshen, R. & Stone, C. (1984), Classification and Regression Tree, 1 edn, Wadsworth.

Buck, S. F. (1960), ‘A method of estimation of missing values in multivariate data suitable for use with an electronic computer’, Journal of the Royal Statistical Society. Series B (Methodological) pp. 302–306.

Dempster, A. P., Laird, N. M. & Rubin, D. B. (1977), ‘Maximum likelihood from incomplete data via the em algorithm’, Journal of the royal statistical society. Series B (methodological) pp. 1–38.

Goicoechea, A. P. (2002), ‘Imputaci´on basada en ´arboles de clasificaci´on’, Eustat. Available in: http://www.eustat.es/documentos/datos/ct 4.

Hansen, M., Hurwits, W. & Madow, W. (1953), Sample survey Methods and Theory, 1 edn, Wiley & Sons.

Krzanowski, W. (1988), ‘Missing value imputation in multivariate data using the singular value decomposition of a matrix’, Biometrical letters 25(1-2), 31–39.

Lee, J., Chang, T. & Krishnaiah, P. (1975), ‘Approximations to the Distributions of the likelihood Ratio Statistics for testing certain structures on the Covariance Matrices of Real Multivariate Normal Populations’, in Multivariate Analysis pp. 105–118.

Little, R. J. & Rubin, D. B. (2014), Statistical analysis with missing data, John Wiley & Sons. Lohr, S. (2009), Sampling: design and analysis, Nelson Education.

López, T. (2001), Estudio de técnicas de análisis de datos para selección de variables, detección de valores atípicos y estimación de valores faltantes en entradas al sistema NEUROMASTER.

Mesa, D. (2004), ‘Imputaci´on y ´arboles de decisi´on’, Gu´ıa pr´actica. Postgrado en Estadística, Universidad Central de Venezuela, Venezuela .

Piela, P., Laaksonen, S. & Finland, S. (2001), Automatic interaction detection for imputation œ tests with the waid software package, in ‘Contributed Paper for the Federal Committee on Statistical Methodology Research Conference, Washington, DC Area’, Citeseer.

Rencher, A. C. (2002), Methods of multivariate analysis, Wiley series in probability and mathematical statistics, 2nd ed edn, J. Wiley.

Schafer, J. L. (1997), Analysis of incomplete multivariate data, CRC press.

Service, G. S. (1996), Report of the task force on imputation, in ‘GSS Methodology Serie’.

Useche, L. & Mesa, D. (2006), ‘Una introducción a la imputación de valores perdidos’, Revista Terra 22(31), 127–151.

Dimensions

PlumX

Visitas

958

Descargas

Los datos de descarga aún no están disponibles.

Cómo citar

Marquez Perez, V. E., Useche Castro, L. M., Mesa Avila, D. M., & Chacon Contreras, A. I. (2017). Imputation strategy with media using regression trees. Comunicaciones En Estadística, 10(1), 9-40. https://doi.org/10.15332/s2027-3355.2017.0001.01

Descargar cita

Licencia

Los autores mantienen los derechos sobre los artículos y por tanto son libres de compartir, copiar, distribuir, ejecutar y comunicar públicamente la obra bajo las condiciones siguientes:

Reconocer los créditos de la obra de la manera especificada por el autor o el licenciante (pero no de una manera que sugiera que tiene su apoyo o que apoyan el uso que hace de su obra).

Comunicaciones en Estadística está bajo una licencia Creative Commons Atribución-NoComercial-CompartirIgual 4.0 Internacional (CC BY-NC-SA 4.0)

La Universidad Santo Tomás conserva los derechos patrimoniales (copyright) de las obras publicadas, y favorece y permite la reutilización de las mismas bajo la licencia anteriormente mencionada.