| Journal of Advanced Artificial Intelligence |
| Foundation of Computer Science (FCS), NY, USA |
| Volume 2 - Number 4 |
| Year of Publication: 2026 |
| Authors: Phuong Luong-Thi-Bich, Quan Nguyen-Minh, Hung Vo-Tri |
10.5120/jaai202658
|
Phuong Luong-Thi-Bich, Quan Nguyen-Minh, Hung Vo-Tri . Comparative Analysis of Classical ML Baselines on Noisy and Imbalanced Occupational Lung Disease Data in Vietnam. Journal of Advanced Artificial Intelligence. 2, 4 ( Jan 2026), 20-26. DOI=10.5120/jaai202658
Occupational lung disease is one of the most serious health problems affecting the global workforce. Early prediction of disease risk is important in medical prevention and intervention. In this study, the proposed approach conducted a comparison of four classical machine learning models—Random Forest (RF), XGBoost, Logistic Regression (LR), and Support Vector Machine (SVM)—on the same set of occupational lung disease data that had been manually processed and encoded. The experimental results show that XGBoost achieves the best performance with an accuracy of 98.34% and a Macro F1-score of 0.7996, followed by LR, RF and SVM. In addition, the characteristic analysis shows that each model focuses on different factors, suggesting the potential to combine multiple models to improve prediction efficiency.