Filtro de SPAM basado en aprendizaje profundo

Danna Lesley Cruz Reyes; Juan Camilo Camargo Prieto; Andres David Leon Hernandez; Felipe Algarra

doi:10.15332/23393076.11818

Published

2025-12-02

Filtro de SPAM basado en aprendizaje profundo

Deep learning-based SPAM filter

DOI: https://doi.org/10.15332/23393076.11818

Danna Cruz-Reyes

Departamento de Estadística, Facultad de Ciencias, Universidad Nacional de Colombia – Sede Bogotá

Juan Camilo Camargo Prieto

Departamento de Estadística, Universidad Nacional de Colombia

Andres David Leon Hernandez

Departamento de Estadística, Universidad Nacional de Colombia, Bogotá, Colombia

Felipe Algarra

Departamento de Estadística, Universidad Nacional de Colombia

PDF (Spanish)

Abstract (en)

The classification of emails into spam or ham is a classic problem in natural language processing. In this study, we implement two deep learning approaches for tackling this task: a BERT-based model and a recurrent neural network using LSTM. Their performance is compared in terms of precision, recall, F1-score, and computational efficiency. Both models were trained and evaluated on the Enron Email Corpus, achieving an overall accuracy of 97% and a balanced F1-score for both spam and ham. While the BERT model shows slightly improved robustness metrics, it also requires significantly more computational resources. Meanwhile, LSTM remains an effective solution when properly designed and trained with lower computational demand. These findings suggest that depending on latency and computing constraints, one may choose between a high-performance transformer-based approach or a more lightweight recurrent implementation.

Keywords (en): spam detection, Natural Language Processing, BERT, LSTM, Deep Learning, Text Classification

Abstract (es)

La clasificación de correos electrónicos como spam o no spam es un problema clásico en el procesamiento de lenguaje natural. En este trabajo se implementan dos enfoques de aprendizaje profundo para abordar esta tarea: un modelo basado en BERT y una red neuronal recurrente LSTM. Se comparan sus rendimientos en términos de precisión, recall, F1 score y eficiencia computacional. Ambos modelos se entrenaron y evaluaron sobre el Enron Email Corpus, alcanzando una exactitud global del 97% y un F1 score equilibrado para ham y spam. El modelo BERT presenta una leve mejora en métricas de robustez, aunque implica mayores tiempos de entrenamiento e inferencia; por su parte, LSTM sigue siendo una solución efectiva cuando se diseña y entrena adecuadamente con un consumo de recursos sensiblemente menor. Estos hallazgos evidencian que, según los requisitos de latencia y capacidad de cómputo, es posible optar por un enfoque transformer de alto rendimiento o una implementación recurrente más ligera.

Keywords (es): aprendizaje profundo, clasificación de texto, spam, BERT, LSTM, PLN, redes neuronales

Andres David Leon Hernandez, Departamento de Estadística, Universidad Nacional de Colombia, Bogotá, Colombia

-

Dimensions

PlumX

Visitas

169

Downloads

Download data is not yet available.

How to Cite

Cruz Reyes, D. L., Camargo Prieto, J. C., Leon Hernandez, A. D., & Algarra, F. (2025). Deep learning-based SPAM filter. Comunicaciones En Estadística, 18(2), 1-9. https://doi.org/10.15332/23393076.11818

Download Citation

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

The authors maintain the rights to the articles and therefore they are free to share, copy, distribute, execute and publicly communicate the work under the following conditions:

Recognize the credits of the work in the manner specified by the author or licensor (but not in a way that suggests that, you have their support or that they support your use of their work).

Comunicaciones en Estadística is licensed under Creative Commons Atribución-NoComercial-CompartirIgual 4.0 Internacional (CC BY-NC-SA 4.0)

Universidad Santo Tomás preserves the patrimonial rights (copyright) of the published works, and favors and allows the reuse of them under the aforementioned license.

Filtro de SPAM basado en aprendizaje profundo

Deep learning-based SPAM filter

Abstract (en)

Abstract (es)

Dimensions

PlumX

Visitas

Downloads

How to Cite

License

Most read articles by the same author(s)