Filtro de SPAM basado en aprendizaje profundo
Deep learning-based SPAM filter
Abstract (en)
The classification of emails into spam or ham is a classic problem in natural language processing. In this study, we implement two deep learning approaches for tackling this task: a BERT-based model and a recurrent neural network using LSTM. Their performance is compared in terms of precision, recall, F1-score, and computational efficiency. Both models were trained and evaluated on the Enron Email Corpus, achieving an overall accuracy of 97% and a balanced F1-score for both spam and ham. While the BERT model shows slightly improved robustness metrics, it also requires significantly more computational resources. Meanwhile, LSTM remains an effective solution when properly designed and trained with lower computational demand. These findings suggest that depending on latency and computing constraints, one may choose between a high-performance transformer-based approach or a more lightweight recurrent implementation.
Abstract (es)
La clasificación de correos electrónicos como spam o no spam es un problema clásico en el procesamiento de lenguaje natural. En este trabajo se implementan dos enfoques de aprendizaje profundo para abordar esta tarea: un modelo basado en BERT y una red neuronal recurrente LSTM. Se comparan sus rendimientos en términos de precisión, recall, F1 score y eficiencia computacional. Ambos modelos se entrenaron y evaluaron sobre el Enron Email Corpus, alcanzando una exactitud global del 97% y un F1 score equilibrado para ham y spam. El modelo BERT presenta una leve mejora en métricas de robustez, aunque implica mayores tiempos de entrenamiento e inferencia; por su parte, LSTM sigue siendo una solución efectiva cuando se diseña y entrena adecuadamente con un consumo de recursos sensiblemente menor. Estos hallazgos evidencian que, según los requisitos de latencia y capacidad de cómputo, es posible optar por un enfoque transformer de alto rendimiento o una implementación recurrente más ligera.
How to Cite
License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
The authors maintain the rights to the articles and therefore they are free to share, copy, distribute, execute and publicly communicate the work under the following conditions:
Recognize the credits of the work in the manner specified by the author or licensor (but not in a way that suggests that, you have their support or that they support your use of their work).
Comunicaciones en Estadística is licensed under Creative Commons Atribución-NoComercial-CompartirIgual 4.0 Internacional (CC BY-NC-SA 4.0)

Universidad Santo Tomás preserves the patrimonial rights (copyright) of the published works, and favors and allows the reuse of them under the aforementioned license.




