CFP last date
29 September 2025
Call for Paper
October Edition
JAAI solicits high quality original research papers for the upcoming October edition of the journal. The last date of research paper submission is 29 September 2025

Submit your paper
Know more
Reseach Article

Credit Risk Prediction using Ensemble and Linear Machine Learning Models

by Bukunmi Gabriel Odunlami, Blessing Nwonu
Journal of Advanced Artificial Intelligence
Foundation of Computer Science (FCS), NY, USA
Volume 2 - Number 1
Year of Publication: 2025
Authors: Bukunmi Gabriel Odunlami, Blessing Nwonu
10.5120/jaai202441

Bukunmi Gabriel Odunlami, Blessing Nwonu . Credit Risk Prediction using Ensemble and Linear Machine Learning Models. Journal of Advanced Artificial Intelligence. 2, 1 ( Aug 2025), 1-8. DOI=10.5120/jaai202441

@article{ 10.5120/jaai202441,
author = { Bukunmi Gabriel Odunlami, Blessing Nwonu },
title = { Credit Risk Prediction using Ensemble and Linear Machine Learning Models },
journal = { Journal of Advanced Artificial Intelligence },
issue_date = { Aug 2025 },
volume = { 2 },
number = { 1 },
month = { Aug },
year = { 2025 },
pages = { 1-8 },
numpages = {9},
url = { https://jaaionline.phdfocus.com/archives/volume2/number1/credit-risk-prediction-using-ensemble-and-linear-machine-learning-models/ },
doi = { 10.5120/jaai202441 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2025-08-14T12:59:17+05:30
%A Bukunmi Gabriel Odunlami
%A Blessing Nwonu
%T Credit Risk Prediction using Ensemble and Linear Machine Learning Models
%J Journal of Advanced Artificial Intelligence
%V 2
%N 1
%P 1-8
%D 2025
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Predicting the likelihood of loan default remains a critical challenge in credit risk modeling, where data imbalance, high dimensionality, and nonlinear interactions often limit the effectiveness of traditional scoring techniques. This paper presents a machine learning pipeline for credit risk prediction using financial datasets. We evaluate six main classifiers—Logistic Regression, Gaussian Naive Bayes, Support Vector Machines, Random Forest, XGBoost, and LightGBM and a variant of two of the classifiers for further comparison. Models are benchmarked using accuracy, precision, recall, and the Kolmogorov–Smirnov statistic widely used in financial risk scoring. Our results indicate that ensemble methods combined with hybrid resampling techniques can consistently offer significant improvements in default risk separation without requiring dimensionality reduction methods, complex deep neural architectures or other black-box models. This makes them suitable for both regulated credit scoring environments and modern machine learning-driven financial applications.

References
  1. Jung Min Ahn, Jungwook Kim, and Kyunghyun Kim. Ensemble machine learning of gradient boosting (xgboost, lightgbm, catboost) and attention-based cnn-lstm for harmful algal blooms forecasting. Toxins, 15(10), 2023.
  2. Luca Bitetto, Francesco Bonacina, Stefano Moramarco, Ugo Moscato, and Eva Pagani. A comparative analysis of supervised learning algorithms for credit scoring: evidence from european data. Socio-Economic Planning Sciences, 89:101535, 2023.
  3. L Breiman. Random forests. Machine Learning, 45:5–32, 10 2001.
  4. W. et al. Chang. Application of machine learning in credit risk prediction: a comprehensive review and evaluation. Economic Modelling, 102:105579, 2021.
  5. Aslı Demirg¨uc¸-Kunt, Enrica Detragiache, and Thierry Tressel. Banking on the principles: Compliance with basel core principles and bank soundness. Journal of Financial Intermediation, 17(4):511–542, 2008.
  6. Gazi Husain, Daniel Nasef, Rejath Jose, Jonathan Mayer, Molly Bekbolatova, Timothy Devine, and Milan Toma. Smote vs. smoteenn: A study on the performance of resampling algorithms for addressing class imbalance in regression models. Algorithms, 18(1), 2025.
  7. Evangelos Kalapodas and Mary Thomson. Credit risk assessment: A challenge for financial institutions. IMA Journal of Management Mathematics, 17, 01 2006.
  8. R. Kavitha, Rupa Shiva Dharshini V, and Priyadharshini M. Performance comparison of xgboost and lightgbm gradient boosting algorithms in predicting cervical cancer risk. In 2024 International Conference on Computing and Data Science (ICCDS), pages 1–6, 2024.
  9. Rakesh Kumar, Meeta Chaudhry, H. K. Patel, Navin Prakash, Abhinav Dogra, and Sunil Kumar. An analysis of ensemble machine learning algorithms for breast cancer detection: Performance and generalization. In 2024 11th International Conference on Computing for Sustainable Global Development (INDIACom), pages 366–370, 2024.
  10. Stefan Lessmann, Bart Baesens, Hsin-Vonn Seow, and Lyn C. Thomas. Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. European Journal of Operational Research, 247(1):124–136, 2015.
  11. Yu Li. Credit risk prediction based on machine learning methods. In 2019 14th International Conference on Computer Science & Education (ICCSE), pages 1011–1013. IEEE, 2019.
  12. Yi Liu, Menglong Yang, Yudong Wang, Yongshan Li, Tiancheng Xiong, and Anzhe Li. Applying machine learning algorithms to predict default probability in the online credit market: Evidence from china. International Review of Financial Analysis, 79:101971, 2022.
  13. V. Z. Marmarelis, D. C. Shin, D. Song, R. E. Hampson, S. A. Deadwyler, and T. W. Berger. Dynamic nonlinear modeling of interactions between neuronal ensembles using principal dynamic modes. In 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pages 3334–3337, Aug 2011.
  14. Mehrdad Naderi, Farzane Hashemi, Andriette Bekker, and Ahad Jamalizadeh. Modeling right-skewed financial data streams: A likelihood inference based on the generalized birnbaum–saunders mixture model. Applied Mathematics and Computation, 376:125109, 2020.
  15. David Powers and Ailab. Evaluation: From precision, recall and f-measure to roc, informedness, markedness correlation. J. Mach. Learn. Technol, 2:2229–3981, 01 2011.
  16. Modisane B. Seitshiro and Seshni Govender. Credit risk prediction with and without weights of evidence using quantitative learning models. Cogent Economics & Finance, 12(1):2338971, 2024.
  17. Vandana Sharma, Amit Singh, Ashendra Kumar Saxena, and Vineet Saxena. A logistic regression based credit risk assessment using woe bining and enhanced feature engineering approach anova and chi-square. In 2023 12th International Conference on System Modeling Advancement in Research Trends (SMART), pages 499–507, Dec 2023.
  18. Merve Veziro˘glu, Erkan Eziro˘glu, and ˙Ihsan Bucak. Performance Comparison between Naive Bayes and Machine Learning Algorithms for News Classification. 01 2024.
  19. Andrew Worster, Jerome Fan, and Afisi Ismaila. Understanding linear and logistic regression analyses. CJEM, 9:111–3, 03 2007.
  20. C. Yu, Y. Jin, Q. Xing, Y. Zhang, S. Guo, and S. Meng. Advanced user credit risk prediction model using lightgbm, xgboost and tabnet with smoteenn. Risks, 12(1):174, 2024.
Index Terms

Computer Science
Information Sciences

Keywords

Credit risk Ensemble model Hybrid resampling Supervised learning Kolmogorov-Smirnov statistic