Una aplicación estadística de los métodos de clasificación en astronomía
A statistical application of classification methods in astronomy
Abstract (en)
In recent years, advances in astrophysics and cosmology have been guided by large and complex data sets, which can only be analyzed and interpreted with the use of highly refined statistical methods. This has caused these disciplines complement each other forming a research field known as astrostatistics. In this paper we provide a classification method based on Gaussian mixture models. This method is used to find stars that belong to the Hyades cluster using 2678 stars sampling from the Hipparcos database. We make a brief description of characteristics of the cluster and we explore the evidence of outliers. With this method it is found that classification yields to three groups of which we can study the membership, and we show the agreement with literature. We also show the Hertzsprung-Russell diagram obtained for the cluster, extremely important for studies of stellar evolution. Finally, the third group found is analyzed through filters considered from classification rules and other statistical methods, for determining the membership of the stars in the Hyades cluster.
Abstract (es)
References
Ball, N. M. & Brunner, R. J. (2010), ‘Data mining and machine learning in astronomy’, International Journal of Modern Physics D 19(07), 1049–1106.
Brieva, E. & Uribe, A. (1985), ‘Una aplicación del método de máxima verosimilitud en astronom´ıa galactica’, Revista Colombiana de Estad´ıstica 12, 1–25.
Celeux, G. & Govaert, G. (1992), ‘A classication em algorithm for clustering and two stochastic versions’, Computational Statistics and Data Analysis 14, 315–332.
Celeux, G. & Govaert, G. (1995), ‘Gaussian parsimonious clustering models’, Pattern Recognition 28, 781–793.
Chilingarian, A. A. & Vardanyan, A. A. (2003), ‘Multivariate methods of data analysis in cosmic-ray astrophysics’, Nuclear Instruments and Methods in
Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment 502(2), 787–788.
Dasgupta, A. & Raftery, A. E. (1998), ‘Detecting features in spatial point processes with clutter via model-based clustering’, Journal of the American Statistical Association 93(441), 294–302.
De’ath, G. (2013), mvpart: Multivariate partitioning.
URL: http://CRAN.R-project.org/package=mvpart
Dempster, A. P., Laird, N. M. & Rubin, D. B. (1977), ‘Maximum likelihood from incomplete data via the em algorithm’, Journal of the Royal statistical Society 39(1), 1–38.
Everitt, B. S., Landau, S., Leese, M. & Stahl, D. (2011), Cluster Analysis, 5 edn, Wiley.
Feigelson, E. D. & Babu, G. J. (2012), Modern Statistical Methods for Astronomy: with R applications, Cambridge: University Press.
Fraley, C. & Raftery, A. E. (1998), ‘How many clusters? which clustering method? answers via model-based cluster analysis’, The computer journal 41(8), 578–588.
Fraley, C. & Raftery, A. E. (2002), ‘Model-based Clustering, Discriminant Analysis and Density Estimation’, Journal of the American Statistical Association 97, 611–631.
Fraley, C., Raftery, A. E., Murphy, T. B. & Scrucca, L. (2012), mclust version 4 for R: Normal Mixture Modeling for Model-Based Clustering, Classification, and Density Estimation, (technical report no. 597), Department of Statistics, University of Washington.
Hobson, M. P., Jaffe, A. H., Liddle, A. R., Mukherjee, P. & Parkinson, D. (2010), Bayesian Methods in Cosmology, Cambridge: University Press.
Hothorn, T. & Zeileis, A. (2013), partykit: A Toolkit for Recursive Partytioning. URL: http://CRAN.R-project.org/package=partykit
Johnson, R. & Wichern, D. (1998), Applied Multivariate Statistical Analysis, 4 edn, New Jersey: Prentice Hall.
Karttunen, H., Kr¨oger, P. & Oja, H. (2007), Fundamental astronomy, 5 edn, New York: Springer.
Liddle, A. R. (2009), ‘Statistical methods for cosmological parameter selection and estimation’, Annual Review of Nuclear and Particle Science 59(1), 95–114.
Loredo, T. J. (2012), ‘On the future of astrostatistics: statistical foundations and statistical practice’, arXiv preprint, arXiv:1208.3035,http://arxiv.org/abs/1208.3035 .
Perryman, M. A. C., Brown, A. G. A., Lebreton, Y., Gómez, A., Turon, C., Cayrel de Strobel, G., Mermilliod, J. C., Robichon, N., Kovalevsky, J. & Crifo, F. (1998), ‘The Hyades: distance, structure, dynamics, and age’, Astronomy and Astrophysics 331, 81–120.
R Core Team (2013), R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria. URL: http://www.R-project.org/
Sarro, L. M., Eyer, L., O’Mullane, W. & De Ridder, J. (2012), Astrostatistics and Data Mining, Vol. 2, New York: Springer.
Schwarz, G. (1978), ‘Estimating the dimension of a model’, The Annals of Statistics 6, 461–464.
Uribe, A., Barrera-Rojas, R.-S. & Brieva, E. (2008), ‘Membership in the region of the open cluster m67 via the expectation maximization algorithm and age determination using a bag of basti isochrones’, Memorias, COCOA 1, 88–93.
Vaughan, S. (2013), ‘Random time series in astronomy’, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 371, 371–399.
How to Cite
License
The authors maintain the rights to the articles and therefore they are free to share, copy, distribute, execute and publicly communicate the work under the following conditions:
Recognize the credits of the work in the manner specified by the author or licensor (but not in a way that suggests that, you have their support or that they support your use of their work).
Comunicaciones en Estadística is licensed under Creative Commons Atribución-NoComercial-CompartirIgual 4.0 Internacional (CC BY-NC-SA 4.0)
Universidad Santo Tomás preserves the patrimonial rights (copyright) of the published works, and favors and allows the reuse of them under the aforementioned license.