Estimación de parámetros en modelos de mezclas usando algoritmos evolutivos

Natalia Romero-Rios, Juan Carlos Correa

Resumen


Los modelos de mezclas son ampliamente usados en casos en los que se tienen elementos de poblaciones diversas, unidos en una super población. Hay métodos tradicionales para la estimación de los parámetros de modelos de mezclas, como lo son el bayesiano y el algoritmo de esperanza-maximización (EM). En esta investigación se propone usar los algoritmos evolutivos, como lo son los algoritmos genéticos, como método que puede servir para encontrar los parámetros de estimación de los modelos de mezclas. Para el desarrollo de este estudio se propone un algoritmo para la comparación de métodos evolutivos y tradicionales y se incluye un ejemplo de aplicación con datos reales. Se encontró que los algoritmos evolutivos son una opción competitiva para la estimación de parámetros en distribuciones de mezclas en los casos cuando las poblaciones en la mezcla siguen una distribución gamma, los pesos en las poblaciones son balanceados y el tamaño de muestra es mayor de 100 ítems. Para las mezclas de distribuciones normales y la estimación del número de poblaciones en una mezcla, el método tradicional es una mejor opción que el algoritmo genético.


Palabras clave


Estimación de mezclas; algoritmos evolutivos; algoritmos genéticos.

Referencias


R. M. A. Lijoi and I. Controlling the reinforcement in bayesian non-parametric mixture models. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 64(4):715–740, 2007.

O. I. C.-B. Adele Cutler. Minimum hellinger distance estimation for finite mixture models. Journal of the American Statistical Association, 91(436):1716–1723, 1996.

M. Agha and M. T. Ibrahim. Algorithm as 203: Maximum likelihood estimation of mixtures of distributions. Journal of the Royal Statistical Society. Series C (Applied Statistics), 33(3):327–332, 1984.

T. Benaglia, D. Chauveau, D. R. Hunter, and D. Young. mixtools: An R package for analyzing finite mixture models. Journal of Statistical Software, 32(6):1–29, 2009.

R. Beran. Minimum hellinger distance estimates for parametric models. The Annals of Statistics, 5(3):445–463, 1977.

S. L. Crawford. An application of the laplace method to finite mixture distributions. Journal of the American Statistical Association, 89(425):pp. 259–267, 1994.

A. Czarn, C. MacNish, K. Vijayan, B. Turlach, and R. Gupta. Statistical exploratory analysis of genetic algorithms. IEEE Transactions on evolutionary computation, 8(14):405–421, 2004.

D. Dacunha-Castelle and E. Gassiat. The estimation of the order of a mixture model. Bernoulli, 3(3):279–299, 1997.

P. Denning. The science of computing: Genetic algorithms. American Scientist, 80(1):12–14, 1992.

D. Dey. Estimation of scale parameters in mixture distributions. The Canadian Journal of Statistics / La Revue Canadienne de Statistique, 18(2):171–178, 1990.

J. D. K. E. Susko and J. Chen. Constrained nonparametric maximumlikelihood estimation for mixture models. The Canadian Journal of Statistics / La Revue Canadienne de Statistique, 6(4):601–617, 1998.

J. Estrada, J. Camacho, M. Restrepo, and C. Parra. Parámetros antropométricos de la población colombiana (acopla95).Rev.F ac.N ac.SaludPública, 15(2) : 112 − −139, 1988.

J. Fonseca. The application of mixture modeling and information criteria for discovering patterns of coronary heart disease. Journal of applied quantitative methods, 3(4):292–303, 2008.

D. Fouskakis and D. Draper. Stochastic optimization: a review. International Statistical Review / Revue Internationale de Statistique, 70(3).

C. Fraley and A. E. Raftery. Model-based clustering, discriminant analysis and density estimation. Journal of the American Statistical Association, 97:611–631, 2002.

M. Gallegos and G. Ritter. Trimmed ml estimation of contaminated mixtures. SankhyA: The Indian Journal of Statistics, Series A ¨ , 71(2):164–220, 2009.

F. Glover. Tabu searchˆapart i. ORSA Journal on Computing, 1(3):190– 206, 1989.

J. D. K. Hanfeng Chen, Jiahua Chen. A modified likelihood ratio test for homogeneity in finite mixture models. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 63(1):19–29, 2001.

J. Harrington. lga: Tools for linear grouping analysis (LGA), 2012. R package version 1.1-1.

J. A. Hartigan. The maximum likelihood prior. The Annals of Statistics, 26(6):2083–2103, 1998.

R. J. Hathaway. A constrained formulation of maximum-likelihood estimation for normal mixture distributions. The Annals of Statistics, 13(2):pp. 795–800, 1985.

R. Haupt and S. Haupt. Practical Genetic Algorithms. Wiley, 2004.

R. J. K. Jingjing WU. On minimum hellinger distance estimation. The Canadian Journal of Statistics / La Revue Canadienne de Statistique, 37(4):514–533, 2009.

P. M. K. Do and F. Tang. A bayesian mixture model for differential gene expression. Journal of the Royal Statistical Society. Series C (Applied Statistics), 54(3):627–624, 2005.

R. Lahoz-Beltra and C. Perales-Gravan. A survey of nonparametric tests for the statistical analysis of evolutionary computational experiments. International Journal Information Theories and Application, 17(1):49– 61, 2010.

B. G. Leroux. Consistent estimation of a mixing distribution. The Annals of Statistics, 20(3):1350–1360, 1992.

R. S. McCrea, B. J. T. Morgan, and D. J. Cole. Age-dependent mixture models for recovery data on animals marked at unknown age. Journal of the Royal Statistical Society. Series C (Applied Statistics), 62(1):pp. 101–113, 2013.

G. McLachlan and K. Basford. Mixture models: inference and applications to clustering. Marcel Dekker, 1988.

M. R. A. T. N. Metropolis, A.W. Rosenbluth and E. Teller. Equation of state calculation by fast computing machines. Journal of Chemical Physics, 21(6):1087–1091, 1953.

P. A. Naik, P. Shi, and C.-L. Tsai. Extending the akaike information criterion to mixture regression models. Journal of the American Statistical Association, 102:244–254, 2007.

J. Nemec and A. F. L. NEMEC. Mixture models for studying stellar populations. i. univariate mixture models, parameter estimation, and the number of discrete population components. Publications of the Astronomical Society of the Pacific,03(659):pp. 95–121, 1991.

F. A. Quintana and M. A. Newton. Computational aspects of nonparametric bayesian analysis with applications to the modeling of multiple binary sequences. Journal of Computational and Graphical Statistics, 9(4):pp. 711–737, 2000.

R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2014.

G. W. R. Tibshirani and T. Hastie. Estimate the number of clusters in a data set via the gap statistic. Journal of Royal Statistical Society B, 63(2).

T. Raykov and G. Marcoulides. A First Course in Structural Equation Modeling. Lawrence Erlbaum Associates, 2006.

E. Reschenhofer. The bimodality principle. Journal of Statistics Education, 9(1), 2001.

J. Reynolds and W. Templin. Comparing mixture estimates by parametric bootstrapping likelihood ratios. Journal of Agricultural, Biological, and Environmental Statistics, 9(1):54–74, 2004.

M. L. S. Chatterjee and L. Lynch. Genetic algorithms and their statistical application: an introduction. Computational Statics and Data Analysis, 22(6):219–234, 1996.

T. Santner, B. Williams, and W. Notz. The Design and Analysis of Computer Experiments. Springer, 2003.

P. Sasieni, P. D.; Royston. Dotplots. Journal of the Royal tatistical Society, 45(2):219–234, 1996.

L. Scrucca. Ga: A package for genetic algorithms in r. Journal of Statistical Software, 53(4):1–37, 2013.

E. Silver. An overview of heuristic solution methods. Journal of the Operational Research Society, 55:936–956, 2004.

R. Snee. Techniques for the analysis of mixture data. Technometrics, 15(3):517–528, 1973.

M. Stephens. Dealing with label switching in mixture models. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 62(4):pp. 795–809, 2000.

J. Tolvi. Genetic algorithms for outlier detection and variable selection in linear regression models. Soft Computing, 8:527–533, 2004.

W. N. Venables and B. D. Ripley. Modern Applied Statistics with S. Springer, New York, fourth edition, 2002. ISBN 0-387-95457-0.

M. West. Approximating posterior distributions by mixture. Journal of the Royal Statistical Society. Series B (Methodological), 55(2):409–422, 1993.

E. Willighagen and M. Ballings. genalg: R Based Genetic Algorithm,2015. R package version 0.2.0.

D. B. R. Yungtai Lo, Nancy R. Mendell. Testing the number of components in a normal mixture. Biometrika, 88(3):767–778, 2001.

M. Zhu and H. Chipman. Darwinian evolution in parallel universes: A parallel genetic algorithm for variable selection. Technometrics, 48(4).




DOI: https://doi.org/10.15332/s2027-3355.2016.0002.05

Enlaces refback

  • No hay ningún enlace refback.


ISSN: 2027-3355 - e-ISSN: 2339-3076 - DOI: https://doi.org/10.15332/23393076