Una nota de cuidado sobre el efecto de datos parcialmente faltantes en la prueba de independencia χ2
A cautionary note on the effect of partially-missing data in the χ2 test of independence
Abstract (en)
The analysis of contingency tables is widely used in many areas, being its main interest to determine any potential associations between two categorical variables. To disclose these associations, it is common to use a test of independence based on the χ2 statistic. However, it is often the case that researchers face situations in which at least one of the variables is partially observed. In general, to perform the χ2 test, the procedure has been to exclude observations with incomplete information. In this cautionary note, we analyze the effect of not considering partially-missing information on the χ2 test of independence when loglinear models are fitted to the data, focusing our attention on the test of independence.
Abstract (es)
El análisis de tablas de contingencia se utiliza ampliamente en muchas disciplinas, siendo el principal inter ́es la determinación de posibles asociaciones entre dos variables categóricas. Una de las pruebas más utilizadas para este fin es la prueba de independencia basada en el estad ́ıstico χ2. Con frecuencia, los investigadores enfrentan situaciones en las que una de las dos variables (o, en el peor de los casos, ambas) es parcialmente observada (es decir, presenta algunos valores faltantes). Por lo general, el procedimiento en estos casos es excluir de los anáisis aquellas observaciones (i.e., sujetos) en los que por lo menos para una de las variables no se tiene información. En esta nota analizamos el efecto de no considerar observaciones parcialmente observadas en la prueba cuando ajustamos modelos loglineales, concentr ́andonos principalmente en la prueba de independencia χ2.
References
Agresti, A. (1990), Categorical Data Analysis, New York: Wiley.
Andersen, E. B. (1997), Introduction to the Statistical Analysis of Categorical Data, Springer-Verlag: Berlin.
Grizzle, J. E., Starmer, C. F. & Koch, G. G. (1969), ‘Analysis of categorical data by linear models’, Biometrics 25(3), 489–504.
Haberman, S. (1972), ‘Log-linear fit for contingency tables—Algorithm AS51’, Applied Statistics 21, 218–225.
Jeansonne, A. (2014), ‘Loglinear Models ’. Consultado Marzo 25, 2014. URL = http://goo.gl/e0Y3aE.
Leung, A. & Robson, W. (1990), ‘Nailbiting’, Clin Pediatr (Phila) 29(12), 690–2.
Little, R. & Rubin, D. (2002), Statistical Analysis With Missing Data, 2nd edn, New York: Wiley.
McCullagh, P. & Nelder, J. A. (1983), Generalized Linear Models, Chapman & Hall, London.
McHugh, M. L. (2013), ‘The χ2 test of independence’, Biochemia Medica 23(2), 143–9.
R Core Team (2014), R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051- 07-0, URL = http://www.R-project.org/.
Rubin, D. B. (1976), ‘Inference and missing data’, Biometrika 63(3), 581–592.
How to Cite
License
The authors maintain the rights to the articles and therefore they are free to share, copy, distribute, execute and publicly communicate the work under the following conditions:
Recognize the credits of the work in the manner specified by the author or licensor (but not in a way that suggests that, you have their support or that they support your use of their work).
Comunicaciones en Estadística is licensed under Creative Commons Atribución-NoComercial-CompartirIgual 4.0 Internacional (CC BY-NC-SA 4.0)
Universidad Santo Tomás preserves the patrimonial rights (copyright) of the published works, and favors and allows the reuse of them under the aforementioned license.