The Impact of Extreme Data Imbalance on Evaluation Metrics of Deep Learning Models for Loan Default Prediction

Authors

  • Irfan Budiyanto Program Studi Teknologi Informasi, Program Magister - Universitas Teknologi Yogyakarta
    Indonesia
  • Arief Hermawan Program Studi Teknologi Informasi - Universitas Teknologi Yogyakarta
    Indonesia
  • Donny Avianto Program Studi Teknologi Informasi - Universitas Teknologi Yogyakarta
    Indonesia
  • Muhammad Kusban Teknik Elektro - Universitas Muhammadiyah Surkarta
    Indonesia

DOI:

https://doi.org/10.23917/emitor.v25i2.10719

Keywords:

Data Imbalance, Loan Default Prediction, Deep Neural Network, ADASYN, Evaluation Metrics

Abstract

The growth of financial technology has made online loans more accessible, but it has also increased the risk of borrowers failing to repay. Developing a reliable system to predict loan defaults is therefore very important. A common problem in these predictions is an imbalance in the data – there are far fewer cases of loan defaults (the minority class) than loans that are paid back on time (the majority class). This imbalance can cause the prediction models to be biased. This research specifically investigates the effect of an extremely increased data imbalance ratio (from 1:170 to 1:33,612) on the evaluation metrics of a Deep Neural Network (DNN) model, particularly when using the Adaptive Synthetic Sampling (ADASYN) oversampling technique. The method used involves adopting a previous research approach that combines ADASYN to handle data imbalance and DNN for prediction, applied to an updated Lending Club dataset with a more severe level of imbalance. The results demonstrate a critical breakdown in key evaluation metrics. Compared to previous research, Accuracy remains high (0.9515) and Specificity is strong (0.9516). However, there is a catastrophic decrease in Precision to almost zero (0.0001), a very low Recall (0.1667), and a resulting F1-Score that is also nearly zero (0.0002). A visual analysis using Principal Component Analysis (PCA) reveals that this decline in Precision is caused by synthetic minority samples generated by ADASYN completely overlapping with the original majority cluster, leading to a massive number of false positives. In conclusion, ADASYN fails to maintain a usable performance level under extreme imbalance conditions, rendering the model ineffective for its intended purpose and highlighting the critical need for alternative strategies when dealing with severe minority class scarcity.

Downloads

Download data is not yet available.

Downloads

Submitted

2025-05-30

Accepted

2025-07-06

Published

2025-07-15

How to Cite

Budiyanto, I., Hermawan, A., Avianto, D., & Kusban, M. (2025). The Impact of Extreme Data Imbalance on Evaluation Metrics of Deep Learning Models for Loan Default Prediction. Emitor: Jurnal Teknik Elektro, 25(2), 154–160. https://doi.org/10.23917/emitor.v25i2.10719

Issue

Section

Articles