CFP last date
28 January 2025
Call for Paper
February Edition
JAAI solicits high quality original research papers for the upcoming February edition of the journal. The last date of research paper submission is 28 January 2025

Submit your paper
Know more
Reseach Article

Applying Various Machine Learning Techniques for Early Diagnosis of Breast Cancer

by Mohamed Shaban Abden, Mostafa Ali Elmasry, Kamel Hussein Rahouma
Journal of Advanced Artificial Intelligence
Foundation of Computer Science (FCS), NY, USA
Volume 1 - Number 3
Year of Publication: 2024
Authors: Mohamed Shaban Abden, Mostafa Ali Elmasry, Kamel Hussein Rahouma
10.5120/jaai202414

Mohamed Shaban Abden, Mostafa Ali Elmasry, Kamel Hussein Rahouma . Applying Various Machine Learning Techniques for Early Diagnosis of Breast Cancer. Journal of Advanced Artificial Intelligence. 1, 3 ( Dec 2024), 14-22. DOI=10.5120/jaai202414

@article{ 10.5120/jaai202414,
author = { Mohamed Shaban Abden, Mostafa Ali Elmasry, Kamel Hussein Rahouma },
title = { Applying Various Machine Learning Techniques for Early Diagnosis of Breast Cancer },
journal = { Journal of Advanced Artificial Intelligence },
issue_date = { Dec 2024 },
volume = { 1 },
number = { 3 },
month = { Dec },
year = { 2024 },
pages = { 14-22 },
numpages = {9},
url = { https://jaaionline.phdfocus.com/archives/volume1/number3/applying-various-machine-learning-techniques-for-early-diagnosis-of-breast-cancer/ },
doi = { 10.5120/jaai202414 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-12-31T21:43:40.419681+05:30
%A Mohamed Shaban Abden
%A Mostafa Ali Elmasry
%A Kamel Hussein Rahouma
%T Applying Various Machine Learning Techniques for Early Diagnosis of Breast Cancer
%J Journal of Advanced Artificial Intelligence
%V 1
%N 3
%P 14-22
%D 2024
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Cancer disease is a category of diseases distinguished as an uncontrolled increase and extension of unnatural cells within the body, often caused by genetic mutations and various risk factors. Breast cancer (BC) stands as a common cancer forms. The early detection through timely examination and treatment greatly improves the chances of a successful outcome. To enhance early detection and improve treatment outcomes, a gene expression data set was used, but the curse of dimensionality appears when trying to analyze such data. We aim to create an accurate model. So, it is important to filter this noise and lower the dimensions in the microarray data, which is considered a mandatory step. In this study, we conducted experiments for the early identification of breast cancer. For this task, we used breast cancer microarray data to classify patients. First, the dataset was normalized using the min-max scalar technique, and then its features were obtained using Binary Harris Hawks Optimization (BHHO). The application of machine learning models like k-nearest neighbor (KNN), support vector machine (SVM), logistic regression (LR), decision tree (DT), and neural network (NN) are investigated. Our experiments show that DT outperformed the other models producing the highest performance across Van't Veer dataset.

References
  1. C.-H. Lee, W.-H. Kuo, C.-C. Lin, Y.-J. Oyang, H.-C. Huang, and H.-F. Juan, "MicroRNA-regulated protein-protein interaction networks and their functions in breast cancer," International journal of molecular sciences, vol. 14, no. 6, pp. 11560-11606, 2013.
  2. E. v. d. Akker et al., "Integrating protein-protein interaction networks with gene-gene co-expression networks improves gene signatures for classifying breast cancer metastasis," Journal of Integrative Bioinformatics, vol. 8, no. 2, pp. 222-238, 2011.
  3. A. Chakraborty et al., "Determining protein–protein interaction using support vector machine: A review," IEEE Access, vol. 9, pp. 12473-12490, 2021.
  4. Sarkar, J. P., Saha, I., Rakshit, S., Pal, M., Wlasnowolski, M., Sarkar, A., ... & Plewczynski, D. (2019, July). A new evolutionary rough fuzzy integrated machine learning technique for microRNA selection using next-generation sequencing data of breast cancer. In Proceedings of the Genetic and Evolutionary Computation Conference Companion (pp. 1846-1854).
  5. X. Wang, B. Yu, A. Ma, C. Chen, B. Liu, and Q. Ma, "Protein–protein interaction sites prediction by ensemble random forests with synthetic minority oversampling technique," Bioinformatics, vol. 35, no. 14, pp. 2395-2402, 2019.
  6. Y.-B. Wang et al., "Predicting protein–protein interactions from protein sequences by a stacked sparse autoencoder deep neural network," Molecular BioSystems, vol. 13, no. 7, pp. 1336-1344, 2017.
  7. R. Sheikhpour, M. A. Sarram, and R. Sheikhpour, "Particle swarm optimization for bandwidth determination and feature selection of kernel density estimation based classifiers in diagnosis of breast cancer," Applied Soft Computing, vol. 40, pp. 113-131, 2016.
  8. C. L. Chowdhary, N. Khare, H. Patel, S. Koppu, R. Kaluri, and D. S. Rajput, "Past, present and future of gene feature selection for breast cancer classification–a survey," International Journal of Engineering Systems Modelling and Simulation, vol. 13, no. 2, pp. 140-153, 2022.
  9. J. Pirgazi, M. Alimoradi, T. Esmaeili Abharian, and M. H. Olyaee, "An Efficient hybrid filter-wrapper metaheuristic-based gene selection method for high dimensional datasets," Scientific reports, vol. 9, no. 1, p. 18580, 2019.
  10. V. Bolón-Canedo, N. Sánchez-Marono, A. Alonso-Betanzos, J. M. Benítez, and F. Herrera, "A review of microarray datasets and applied feature selection methods," Information sciences, vol. 282, pp. 111-135, 2014.
  11. N. Sánchez-Maroño, A. Alonso-Betanzos, and M. Tombilla-Sanromán, "Filter methods for feature selection--a comparative study," Lecture notes in computer science, vol. 4881, pp. 178-187, 2007.
  12. W. Ali and F. Saeed, "Hybrid filter and genetic algorithm-based feature selection for improving cancer classification in high-dimensional microarray data," Processes, vol. 11, no. 2, p. 562, 2023.
  13. L. J. Van't Veer et al., "Gene expression profiling predicts clinical outcome of breast cancer," nature, vol. 415, no. 6871, pp. 530-536, 2002.
  14. M. Abd-elnaby, M. Alfonse, and M. Roushdy, "A Hybrid Mutual Information-LASSO-Genetic Algorithm Selection Approach for Classifying Breast Cancer," in Digital Transformation Technology: Proceedings of ITAF 2020, 2022: Springer, pp. 547-560.
  15. F. Jiang, Q. Zhu, and T. Tian, "Breast Cancer Detection Based on Modified Harris Hawks Optimization and Extreme Learning Machine Embedded with Feature Weighting," Neural Processing Letters, pp. 1-24, 2022.
  16. A. Tahmouresi, E. Rashedi, M. M. Yaghoobi, and M. Rezaei, "Gene selection using pyramid gravitational search algorithm," Plos one, vol. 17, no. 3, p. e0265351, 2022.
  17. K.-J. Kao, K.-M. Chang, H.-C. Hsu, and A. T. Huang, "Correlation of microarray-based breast cancer molecular subtypes and clinical outcomes: implications for treatment optimization," BMC cancer, vol. 11, no. 1, pp. 1-15, 2011.
  18. S. Abasabadi, H. Nematzadeh, H. Motameni, and E. Akbari, "Hybrid feature selection based on SLI and genetic algorithm for microarray datasets," The Journal of Supercomputing, pp. 1-29, 2022.
  19. Kowsari, Y., Nakhodchi, S., & Gholamiangonabadi, D. (2022). Gene selection from microarray expression data: A Multi-objective PSO with adaptive K-nearest neighborhood. arXiv preprint arXiv:2205.15020.
  20. G. G. Afif and W. Astuti, "Cancer Detection based on Microarray Data Classification Using FLNN and Hybrid Feature Selection," Jurnal RESTI (Rekayasa Sistem Dan Teknologi Informasi), vol. 5, no. 4, pp. 794-801, 2021.
  21. S. K. Baliarsingh, C. Dora, and S. Vipsita, "Jaya optimized extreme learning machine for breast cancer data classification," in Intelligent and Cloud Computing: Springer, 2021, pp. 459-467.
  22. J. M. Moosa, R. Shakur, M. Kaykobad, and M. S. Rahman, "Gene selection for cancer classification with the help of bees," BMC medical genomics, vol. 9, pp. 135-165, 2016.
  23. R.-J. Palma-Mendoza, D. Rodriguez, and L. De-Marcos, "Distributed ReliefF-based feature selection in Spark," Knowledge and Information Systems, vol. 57, pp. 1-20, 2018.
  24. Y. Saeys, I. Inza, and P. Larranaga, "A review of feature selection techniques in bioinformatics," bioinformatics, vol. 23, no. 19, pp. 2507-2517, 2007.
  25. I. A. Gheyas and L. S. Smith, "Feature subset selection in large dimensionality domains," Pattern recognition, vol. 43, no. 1, pp. 5-13, 2010.
  26. Khurma, R. A., Castillo, P. A., Sharieh, A., & Aljarah, I. (2020). New Fitness Functions in Binary Harris Hawks Optimization for Gene Selection in Microarray Datasets. In IJCCI (pp. 139-146).
  27. D. W. Aha, D. Kibler, and M. K. Albert, "Instance-based learning algorithms," Machine learning, vol. 6, pp. 37-66, 1991.
  28. Mohapatra, P., & Chakravarty, S. (2015, October). Modified PSO based feature selection for Microarray data classification. In 2015 IEEE Power, Communication and Information Technology Conference (PCITC) (pp. 703-709). IEEE.
  29. Wang, Y., & Witten, I. H. (1996). Induction of model trees for predicting continuous classes.
  30. T. Rymarczyk, E. Kozłowski, G. Kłosowski, and K. Niderla, "Logistic regression for machine learning in process tomography," Sensors, vol. 19, no. 15, p. 3400, 2019.
  31. Mahesh, B. (2020). Machine learning algorithms-a review. International Journal of Science and Research (IJSR).[Internet], 9(1), 381-386.
  32. D. Chowdary et al., "Prognostic gene expression signatures can be measured in tissues collected in RNAlater preservative," The journal of molecular diagnostics, vol. 8, no. 1, pp. 31-39, 2006.
  33. K. Chin et al., "Genomic and transcriptional aberrations linked to breast cancer pathophysiologies," Cancer cell, vol. 10, no. 6, pp. 529-541, 2006.
  34. E. Gravier et al., "A prognostic DNA signature for T1T2 node‐negative breast cancer patients," Genes, chromosomes and cancer, vol. 49, no. 12, pp. 1125-1134, 2010.
  35. M. West et al., "Predicting the clinical status of human breast cancer by using gene expression profiles," Proceedings of the National Academy of Sciences, vol. 98, no. 20, pp. 11462-11467, 2001.
Index Terms

Computer Science
Information Sciences

Keywords

Breast cancer classification. Microarray data. Binary Harris Hawks Optimization (BHHO)