CFP last date
02 March 2026
Call for Paper
April Edition
JAAI solicits high quality original research papers for the upcoming April edition of the journal. The last date of research paper submission is 02 March 2026

Submit your paper
Know more
Reseach Article

Comparative Analysis of Classical ML Baselines on Noisy and Imbalanced Occupational Lung Disease Data in Vietnam

by Phuong Luong-Thi-Bich, Quan Nguyen-Minh, Hung Vo-Tri
Journal of Advanced Artificial Intelligence
Foundation of Computer Science (FCS), NY, USA
Volume 2 - Number 4
Year of Publication: 2026
Authors: Phuong Luong-Thi-Bich, Quan Nguyen-Minh, Hung Vo-Tri
10.5120/jaai202658

Phuong Luong-Thi-Bich, Quan Nguyen-Minh, Hung Vo-Tri . Comparative Analysis of Classical ML Baselines on Noisy and Imbalanced Occupational Lung Disease Data in Vietnam. Journal of Advanced Artificial Intelligence. 2, 4 ( Jan 2026), 20-26. DOI=10.5120/jaai202658

@article{ 10.5120/jaai202658,
author = { Phuong Luong-Thi-Bich, Quan Nguyen-Minh, Hung Vo-Tri },
title = { Comparative Analysis of Classical ML Baselines on Noisy and Imbalanced Occupational Lung Disease Data in Vietnam },
journal = { Journal of Advanced Artificial Intelligence },
issue_date = { Jan 2026 },
volume = { 2 },
number = { 4 },
month = { Jan },
year = { 2026 },
pages = { 20-26 },
numpages = {9},
url = { https://jaaionline.phdfocus.com/archives/volume2/number4/comparative-analysis-of-classical-ml-baselines-on-noisy-and-imbalanced-occupational-lung-disease-data-in-vietnam/ },
doi = { 10.5120/jaai202658 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2026-01-31T17:34:43+05:30
%A Phuong Luong-Thi-Bich
%A Quan Nguyen-Minh
%A Hung Vo-Tri
%T Comparative Analysis of Classical ML Baselines on Noisy and Imbalanced Occupational Lung Disease Data in Vietnam
%J Journal of Advanced Artificial Intelligence
%V 2
%N 4
%P 20-26
%D 2026
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Occupational lung disease is one of the most serious health problems affecting the global workforce. Early prediction of disease risk is important in medical prevention and intervention. In this study, the proposed approach conducted a comparison of four classical machine learning models—Random Forest (RF), XGBoost, Logistic Regression (LR), and Support Vector Machine (SVM)—on the same set of occupational lung disease data that had been manually processed and encoded. The experimental results show that XGBoost achieves the best performance with an accuracy of 98.34% and a Macro F1-score of 0.7996, followed by LR, RF and SVM. In addition, the characteristic analysis shows that each model focuses on different factors, suggesting the potential to combine multiple models to improve prediction efficiency.

References
  1. M. M. Islam, M. R. Haque, H. Iqbal, M. M. Hasan, M. Hasan, and M. N. Kabir, “Breast cancer prediction: a comparative study using machine learning techniques,” SN Computer Science, vol. 1, pp. 1–14, 2020.
  2. V. Ramalingam, A. Dandapath, and M. K. Raja, “Heart disease prediction using machine learning techniques: a survey,” International Journal of Engineering & Technology, vol. 7, no. 2.8, pp. 684–687, 2018.
  3. K. Nguyen-Trong, T. Vu-Van, P. Luong Thi Bich, “Graph Convolutional Network for Occupational Disease Prediction with Multiple Dimensional Data,” International Journal of Advanced Computer Science and Applications, vol. 15, no. 7, 2024.
  4. K. Pingale, S. Surwase, V. Kulkarni, S. Sarage, and A. Karve, “Disease prediction using machine learning,” International Research Journal of Engineering and Technology (IRJET), vol. 6, no. 12, pp. 831–833, 2019.
  5. G. Sailasya and G. L. A. Kumari, “Analyzing the performance of stroke prediction using ML classification algorithms,” International Journal of Advanced Computer Science and Applications, vol. 12, no. 6, 2021.
  6. T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” arXiv preprint arXiv:1609.02907, 2016.
  7. D. W. Hosmer, S. Lemeshow, and R. X. Sturdivant, Applied Logistic Regression, 3rd ed., Wiley, 2013.
  8. L. Breiman, “Random forests,” Machine Learning, vol. 45, pp. 5–32, 2001.
  9. T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proc. 22nd ACM SIGKDD, pp. 785–794, 2016.
  10. K. Nguyen-Trong, T. Vu-Van, P. Luong Thi Bich, “Graph Convolutional Network for Occupational Disease Prediction with Multiple Dimensional Data,” International Journal of Advanced Computer Science and Applications, vol. 15, no. 7, 2024.
  11. A. M. Barhoom, A. Almasri, B. S. Abu-Nasser, and S. S. Abu-Naser, “Prediction of heart disease using a collection of machine and deep learning algorithms,” 2022.
  12. N. Biswas, K. M. M. Uddin, S. T. Rikta, and S. K. Dey, “A comparative analysis of machine learning classifiers for stroke prediction: A predictive analytics approach,” Healthcare Analytics, vol. 2, p. 100116, 2022.
  13. M. Ashrafuzzaman, S. Saha, and K. Nur, “Prediction of stroke disease using deep CNN based approach,” Journal of Advances in Information Technology, vol. 13, no. 6, 2022.
  14. J. Prusa, T. M. Khoshgoftaar, D. J. Dittman, and A. Napolitano, “Using random undersampling to alleviate class imbalance on tweet sentiment
  15. Data,” in 2015 IEEE International Conference on Information Reuse and Integration, IEEE, pp. 197–202.
  16. Z. Zheng, Y. Cai, and Y. Li, “Oversampling method for imbalanced classification,” Computing and Informatics, vol. 34, no. 5, pp. 1017–1037, 2015.
  17. Couronné, R., Probst, P. & Boulesteix, AL. “Random forest versus logistic regression: a large-scale benchmark experiment”. BMC Bioinformatics 19, 270 (2018). https://doi.org/10.1186/s12859-018-2264-5
Index Terms

Computer Science
Information Sciences

Keywords

Occupational lung disease classical machine learning Random Forest XGBoost Logistic Regression SVM