Imputation strategy with media using regression trees
Abstract (en)
An imputation design is presented to combine classication and imputation in order to improve the quality of imputed datum. Imputation is done with completely randomized missing quantitative data and using regression trees. Media imputation techniques is compared, theoretical and empirically, using regression trees, in order to develop an integral classication and imputation strategy.
Unbiased estimators were obtained developing the expected value of the estimator. Estimators proprieties were evaluated trough their variance and bias development, which showed non bias. as for the unbiased estimator variance of the media, suficiency was not proved for the media estimator.
Abstract (es)
An imputation design is presented to combine classification and imputation in order to improve the quality of imputed datum. Imputation is done with completely randomized missing quantitative data and using regression trees. Media imputation techniques is compared, theoretical and empirically, using regression trees, in order to develop an integral classification and imputation strategy.
Unbiased estimators were obtained developing the expected value of the estimator. Estimator’s proprieties were evaluated trough their variance and bias development, which showed non bias. as for the unbiased estimator variance of the media, sufficiency was not proved for the media estimator.
References
Bárcena, M. J. & Tusell, F. (1999), ‘Enlace de encuestas: una propuesta metodológica y aplicación a la encuesta de presupuestos de tiempo’.
Borgoni, R. & Berrington, A. (1990), ‘A sequential tree-based procedure for multivariate imputation of complex missing data structure’, Journal of the American Statistical Association 85(410), 376–386.
BOX, G. E. P. (1949), ‘A general distribution theory for a class of likelihood criteria’, Biometrika 36.
Breiman, L., Freidman, J., Olshen, R. & Stone, C. (1984), Classification and Regression Tree, 1 edn, Wadsworth.
Buck, S. F. (1960), ‘A method of estimation of missing values in multivariate data suitable for use with an electronic computer’, Journal of the Royal Statistical Society. Series B (Methodological) pp. 302–306.
Dempster, A. P., Laird, N. M. & Rubin, D. B. (1977), ‘Maximum likelihood from incomplete data via the em algorithm’, Journal of the royal statistical society. Series B (methodological) pp. 1–38.
Goicoechea, A. P. (2002), ‘Imputaci´on basada en ´arboles de clasificaci´on’, Eustat. Available in: http://www.eustat.es/documentos/datos/ct 4.
Hansen, M., Hurwits, W. & Madow, W. (1953), Sample survey Methods and Theory, 1 edn, Wiley & Sons.
Krzanowski, W. (1988), ‘Missing value imputation in multivariate data using the singular value decomposition of a matrix’, Biometrical letters 25(1-2), 31–39.
Lee, J., Chang, T. & Krishnaiah, P. (1975), ‘Approximations to the Distributions of the likelihood Ratio Statistics for testing certain structures on the Covariance Matrices of Real Multivariate Normal Populations’, in Multivariate Analysis pp. 105–118.
Little, R. J. & Rubin, D. B. (2014), Statistical analysis with missing data, John Wiley & Sons. Lohr, S. (2009), Sampling: design and analysis, Nelson Education.
López, T. (2001), Estudio de técnicas de análisis de datos para selección de variables, detección de valores atípicos y estimación de valores faltantes en entradas al sistema NEUROMASTER.
Mesa, D. (2004), ‘Imputaci´on y ´arboles de decisi´on’, Gu´ıa pr´actica. Postgrado en Estadística, Universidad Central de Venezuela, Venezuela .
Piela, P., Laaksonen, S. & Finland, S. (2001), Automatic interaction detection for imputation œ tests with the waid software package, in ‘Contributed Paper for the Federal Committee on Statistical Methodology Research Conference, Washington, DC Area’, Citeseer.
Rencher, A. C. (2002), Methods of multivariate analysis, Wiley series in probability and mathematical statistics, 2nd ed edn, J. Wiley.
Schafer, J. L. (1997), Analysis of incomplete multivariate data, CRC press.
Service, G. S. (1996), Report of the task force on imputation, in ‘GSS Methodology Serie’.
Useche, L. & Mesa, D. (2006), ‘Una introducción a la imputación de valores perdidos’, Revista Terra 22(31), 127–151.
How to Cite
License
The authors maintain the rights to the articles and therefore they are free to share, copy, distribute, execute and publicly communicate the work under the following conditions:
Recognize the credits of the work in the manner specified by the author or licensor (but not in a way that suggests that, you have their support or that they support your use of their work).
Comunicaciones en Estadística is licensed under Creative Commons Atribución-NoComercial-CompartirIgual 4.0 Internacional (CC BY-NC-SA 4.0)
Universidad Santo Tomás preserves the patrimonial rights (copyright) of the published works, and favors and allows the reuse of them under the aforementioned license.