Estimación de parámetros en modelos de mezclas usando algoritmos evolutivos
Parameter estimation in mixture models using evolutive algorithms
Abstract (en)
The mixture models are widely used in cases when there are elements that come from different populations, mixed in a superpopulation. There are traditional methods for the estimation of the parameters in mixture models: the Bayesian Method and the Expectation-Maximization (EM) algorithm. For that reason, in this work we propose the use of evolutive algorithms, such as genetic algorithms. We propose an algorithm for the comparison of evolutive and traditional methods, and we illustrate the use of this algorithm with a real application. We found that the evolutive algorithms are a competitive option to estimate the parameters in mixture models in the cases when the populations in the mixture follows a gamma distribution, the weights of the populations in the mixture are even and the sam- ple size is bigger than 100 items. For the mixture of normal distributions and the estimation of the number of populations in a mixture, the traditional method is a better option than the genetic algorithm.
Abstract (es)
Los modelos de mezclas son ampliamente usados en casos en los que se tienen elementos de poblaciones diversas, unidos en una super población. Hay métodos tradicionales para la estimación de los parámetros de modelos de mezclas, como lo son el bayesiano y el algoritmo de esperanza-maximización (EM). En esta investigación se propone usar los algoritmos evolutivos, como lo son los algoritmos genéticos, como método que puede servir para encontrar los parámetros de estimación de los modelos de mezclas. Para el desarrollo de este estudio se propone un algoritmo para la comparación de métodos evolutivos y tradicionales y se incluye un ejemplo de aplicación con datos reales. Se encontró que los algoritmos evolutivos son una opción competitiva para la estimación de parámetros en distribuciones de mezclas en los casos cuando las poblaciones en la mezcla siguen una distribución gamma, los pesos en las poblaciones son balanceados y el tamaño de muestra es mayor de 100 ítems. Para las mezclas de distribuciones normales y la estimación del número de poblaciones en una mezcla, el método tradicional es una mejor opción que el algoritmo genético.
References
R. M. A. Lijoi and I. Controlling the reinforcement in bayesian non-parametric mixture models. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 64(4):715–740, 2007.
O. I. C.-B. Adele Cutler. Minimum hellinger distance estimation for finite mixture models. Journal of the American Statistical Association, 91(436):1716–1723, 1996.
M. Agha and M. T. Ibrahim. Algorithm as 203: Maximum likelihood estimation of mixtures of distributions. Journal of the Royal Statistical Society. Series C (Applied Statistics), 33(3):327–332, 1984.
T. Benaglia, D. Chauveau, D. R. Hunter, and D. Young. mixtools: An R package for analyzing finite mixture models. Journal of Statistical Software, 32(6):1–29, 2009.
R. Beran. Minimum hellinger distance estimates for parametric models. The Annals of Statistics, 5(3):445–463, 1977.
S. L. Crawford. An application of the laplace method to finite mixture distributions. Journal of the American Statistical Association, 89(425):pp. 259–267, 1994.
A. Czarn, C. MacNish, K. Vijayan, B. Turlach, and R. Gupta. Statistical exploratory analysis of genetic algorithms. IEEE Transactions on evolutionary computation, 8(14):405–421, 2004.
D. Dacunha-Castelle and E. Gassiat. The estimation of the order of a mixture model. Bernoulli, 3(3):279–299, 1997.
P. Denning. The science of computing: Genetic algorithms. American Scientist, 80(1):12–14, 1992.
D. Dey. Estimation of scale parameters in mixture distributions. The Canadian Journal of Statistics / La Revue Canadienne de Statistique, 18(2):171–178, 1990.
J. D. K. E. Susko and J. Chen. Constrained nonparametric maximumlikelihood estimation for mixture models. The Canadian Journal of Statistics / La Revue Canadienne de Statistique, 6(4):601–617, 1998.
J. Estrada, J. Camacho, M. Restrepo, and C. Parra. Parámetros antropométricos de la población colombiana (acopla95).Rev.F ac.N ac.SaludPública, 15(2) : 112 − −139, 1988.
J. Fonseca. The application of mixture modeling and information criteria for discovering patterns of coronary heart disease. Journal of applied quantitative methods, 3(4):292–303, 2008.
D. Fouskakis and D. Draper. Stochastic optimization: a review. International Statistical Review / Revue Internationale de Statistique, 70(3).
C. Fraley and A. E. Raftery. Model-based clustering, discriminant analysis and density estimation. Journal of the American Statistical Association, 97:611–631, 2002.
M. Gallegos and G. Ritter. Trimmed ml estimation of contaminated mixtures. SankhyA: The Indian Journal of Statistics, Series A ¨ , 71(2):164–220, 2009.
F. Glover. Tabu searchˆapart i. ORSA Journal on Computing, 1(3):190– 206, 1989.
J. D. K. Hanfeng Chen, Jiahua Chen. A modified likelihood ratio test for homogeneity in finite mixture models. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 63(1):19–29, 2001.
J. Harrington. lga: Tools for linear grouping analysis (LGA), 2012. R package version 1.1-1.
J. A. Hartigan. The maximum likelihood prior. The Annals of Statistics, 26(6):2083–2103, 1998.
R. J. Hathaway. A constrained formulation of maximum-likelihood estimation for normal mixture distributions. The Annals of Statistics, 13(2):pp. 795–800, 1985.
R. Haupt and S. Haupt. Practical Genetic Algorithms. Wiley, 2004.
R. J. K. Jingjing WU. On minimum hellinger distance estimation. The Canadian Journal of Statistics / La Revue Canadienne de Statistique, 37(4):514–533, 2009.
P. M. K. Do and F. Tang. A bayesian mixture model for differential gene expression. Journal of the Royal Statistical Society. Series C (Applied Statistics), 54(3):627–624, 2005.
R. Lahoz-Beltra and C. Perales-Gravan. A survey of nonparametric tests for the statistical analysis of evolutionary computational experiments. International Journal Information Theories and Application, 17(1):49– 61, 2010.
B. G. Leroux. Consistent estimation of a mixing distribution. The Annals of Statistics, 20(3):1350–1360, 1992.
R. S. McCrea, B. J. T. Morgan, and D. J. Cole. Age-dependent mixture models for recovery data on animals marked at unknown age. Journal of the Royal Statistical Society. Series C (Applied Statistics), 62(1):pp. 101–113, 2013.
G. McLachlan and K. Basford. Mixture models: inference and applications to clustering. Marcel Dekker, 1988.
M. R. A. T. N. Metropolis, A.W. Rosenbluth and E. Teller. Equation of state calculation by fast computing machines. Journal of Chemical Physics, 21(6):1087–1091, 1953.
P. A. Naik, P. Shi, and C.-L. Tsai. Extending the akaike information criterion to mixture regression models. Journal of the American Statistical Association, 102:244–254, 2007.
J. Nemec and A. F. L. NEMEC. Mixture models for studying stellar populations. i. univariate mixture models, parameter estimation, and the number of discrete population components. Publications of the Astronomical Society of the Pacific,03(659):pp. 95–121, 1991.
F. A. Quintana and M. A. Newton. Computational aspects of nonparametric bayesian analysis with applications to the modeling of multiple binary sequences. Journal of Computational and Graphical Statistics, 9(4):pp. 711–737, 2000.
R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, 2014.
G. W. R. Tibshirani and T. Hastie. Estimate the number of clusters in a data set via the gap statistic. Journal of Royal Statistical Society B, 63(2).
T. Raykov and G. Marcoulides. A First Course in Structural Equation Modeling. Lawrence Erlbaum Associates, 2006.
E. Reschenhofer. The bimodality principle. Journal of Statistics Education, 9(1), 2001.
J. Reynolds and W. Templin. Comparing mixture estimates by parametric bootstrapping likelihood ratios. Journal of Agricultural, Biological, and Environmental Statistics, 9(1):54–74, 2004.
M. L. S. Chatterjee and L. Lynch. Genetic algorithms and their statistical application: an introduction. Computational Statics and Data Analysis, 22(6):219–234, 1996.
T. Santner, B. Williams, and W. Notz. The Design and Analysis of Computer Experiments. Springer, 2003.
P. Sasieni, P. D.; Royston. Dotplots. Journal of the Royal tatistical Society, 45(2):219–234, 1996.
L. Scrucca. Ga: A package for genetic algorithms in r. Journal of Statistical Software, 53(4):1–37, 2013.
E. Silver. An overview of heuristic solution methods. Journal of the Operational Research Society, 55:936–956, 2004.
R. Snee. Techniques for the analysis of mixture data. Technometrics, 15(3):517–528, 1973.
M. Stephens. Dealing with label switching in mixture models. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 62(4):pp. 795–809, 2000.
J. Tolvi. Genetic algorithms for outlier detection and variable selection in linear regression models. Soft Computing, 8:527–533, 2004.
W. N. Venables and B. D. Ripley. Modern Applied Statistics with S. Springer, New York, fourth edition, 2002. ISBN 0-387-95457-0.
M. West. Approximating posterior distributions by mixture. Journal of the Royal Statistical Society. Series B (Methodological), 55(2):409–422, 1993.
E. Willighagen and M. Ballings. genalg: R Based Genetic Algorithm,2015. R package version 0.2.0.
D. B. R. Yungtai Lo, Nancy R. Mendell. Testing the number of components in a normal mixture. Biometrika, 88(3):767–778, 2001.
M. Zhu and H. Chipman. Darwinian evolution in parallel universes: A parallel genetic algorithm for variable selection. Technometrics, 48(4).
How to Cite
License
The authors maintain the rights to the articles and therefore they are free to share, copy, distribute, execute and publicly communicate the work under the following conditions:
Recognize the credits of the work in the manner specified by the author or licensor (but not in a way that suggests that, you have their support or that they support your use of their work).
Comunicaciones en Estadística is licensed under Creative Commons Atribución-NoComercial-CompartirIgual 4.0 Internacional (CC BY-NC-SA 4.0)
Universidad Santo Tomás preserves the patrimonial rights (copyright) of the published works, and favors and allows the reuse of them under the aforementioned license.