Introduction
Welcome to another entry of Research Management Insights. This week, I will discuss the approaches to assessing and evaluating research project outcomes and findings. Drawing from various research methods, I will outline the type of data I will gather and the analysis techniques I plan to employ for my research on credit risk modeling.
Research Methods for Assessing and Evaluating Outcomes
In my research proposal on enhancing credit risk models, I identified the need to develop more stable and accurate credit risk models. This involves a multi-faceted approach to data collection and analysis to ensure robust outcomes. Here, I discuss the methodologies and tools I will use.
Data Collection Methods
1. Secondary Data Analysis:
I will utilize existing data provided by the Home Credit Group via the Kaggle platform (Herman et al., 2024). This data includes various borrower characteristics and loan outcomes.
Using secondary data (MacInnes, 2016) provides a rich source of information that is both cost-effective and time-efficient, allowing me to leverage real-world historical data for model training and validation that would not otherwise be available.
2. Quantitative Data Collection:
The focus will be on numerical data, particularly the borrowers’ credit scores, payment histories, and demographic information. Quantitative data is essential for developing predictive models, as it allows for the application of statistical and machine-learning techniques to identify patterns and trends (Max & Kjell, 2013).
Data Analysis Techniques
Data Analysis and Evaluation Techniques
- Ensemble Machine Learning Techniques: Random Forests and Gradient Boosting Machines will be employed due to their robustness and ability to handle large datasets with complex interactions (Barbaglia et at., 2021; Gunnarsson et al., 2021; Xia et al., 2017).
- Gini Stability Metric: This metric (Herman et al., 2024) will be used to evaluate the stability of the models across different economic scenarios and demographic segments. The Gini stability metric is calculated as:
stability metric=𝑚𝑒𝑎𝑛(𝑔𝑖𝑛𝑖)+88.0⋅𝑚𝑖𝑛(0,𝑎)−0.5⋅𝑠𝑡𝑑(residuals)
(Herman et al., 2024)
- Cross-Validation: This is a technique where the dataset is divided into several folds, and the model is trained and validated on different folds iteratively (Berrar, 2018). I will train multiple models on different subsets of the data and combine their predictions to improve overall accuracy and stability. Stability is a crucial aspect of credit risk models, ensuring that their performance remains consistent over time and across varying conditions (Řezáč & Řezáč, 2011). Cross-validation helps in assessing the model’s performance and generalizability, reducing the risk of overfitting (Berrar, 2018).
Planned Approach to Assessing and Evaluating Project Outcomes
1. Setting Clear Objectives:
- The primary objective is to enhance the stability and accuracy of credit risk models. Secondary objectives include understanding the impact of different borrower characteristics on credit risk and evaluating the practical applications of these models in real-world scenarios.
2. Defining Success Criteria:
- Predictive Accuracy: Measured by metrics such as the Area Under the Receiver Operating Characteristic Curve (AUC-ROC) (Google, 2022) and accuracy (Google, 2022a), precision & recall rates (Google, 2022b).
- Model Stability: Assessed using the Gini Stability Metric, ensuring that model performance is consistent across different subsets of the data.
3. Evaluation Protocols:
- Initial Assessment: Conducting an initial analysis to identify potential weaknesses and areas for improvement in the models.
- Iterative Testing: Implementing an iterative process where models are continually tested and refined based on feedback from the initial assessments. Also known as hyperparameter tuning (Amazon Web Services, 2024).
- Final Evaluation: Conducting a comprehensive evaluation of the final models against the defined success criteria.
- Fairness Evaluation: In addition to the traditional model performance metrics, the model should also be evaluated for fairness. As the model will be used to make financial decisions, it is important that it is not discriminatory. Some machine learning platforms such as IBM watsonx have built-in fairness evaluation metrics and recommend a fairness threshold of 80% or above (IMB, 2024).
Expected Outcomes
1. Observed Model Performance:
- By employing ensemble techniques and rigorous validation protocols, the research aims to develop credit risk models that are both accurate, stable and fair.
2. Practical Applications:
- The models developed through this research has the potential to provide financial institutions with reliable tools for credit risk assessment, potentially improving financial inclusion by making lending decisions more equitable and reliable.
3. Academic Contributions:
- This research will contribute to the existing literature on credit risk modeling by providing insights into the stability of machine learning models and their application in dynamic economic environments (Feldhutter & Schaefer, 2023).
Conclusion
Evaluating and assessing research project outcomes requires a systematic and robust approach. By leveraging advanced data collection and analysis techniques, this research aims to develop credit risk models that are both accurate and stable. The methodologies outlined not only ensure the reliability of the findings but also contribute to practical applications in financial decision-making.
References
Amazon Web Services. (2024) What is Hyperarameter Tuning? Available at: https://aws.amazon.com/what-is/hyperparameter-tuning/#:~:text=When%20you’re%20training%20machine,This%20is%20called%20hyperparameter%20tuning. (Accessed: 22nd May 2024)
Barbaglia, L., Manzan, S. and Tosetti, E. (2021) ‘Forecasting loan default in Europe with machine learning’. Journal of Financial Econometrics, 21(2), pp. 569 – 596
Berrar, D. (2018) Cross-validation. Encyclopedia of Bioinformatics and Computational Biology. 1(2018), pp. 542 – 545
Google. (2022) Classification: ROC Curve and AUC. Available at: https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc (Accessed: 22th May 2024)
Google. (2022a) Classification: Accuracy. Available at: https://developers.google.com/machine-learning/crash-course/classification/accuracy (Accessed: 22nd May 2024)
Google. (2022b) Classification: Precision and Recall. Available at: https://developers.google.com/machine-learning/crash-course/classification/precision-and-recall (Accessed: 22nd May 2024)
Gunnarsson, B., Broucke, S., Baesens, B., Óskarsdóttir, M. and Lamahieu, W. (2021) Deep learning for credit scoring: Do or don’t? European Journal of Operational Research. 295(1), pp. 292-305
Herman, D., Jelinek, T., Reade, W., Demkin, M. & Howard, A. (2024). Home Credit – Credit Risk Model Stability. Kaggle. Available at” https://kaggle.com/competitions/home-credit-credit-risk-model-stability (Accessed: 22nd May 2024)
IBM (2024) Quick start: Evaluate a machine learning model. Available at: https://www.ibm.com/docs/en/watsonx/saas?topic=models-evaluate-machine-learning-model
MacInnes, J. (2016) An Introduction to Secondary Data Analysis with IBM SPSS. London: Sage Publications Ltd
Max, K. and Kjell, J. (2013) Applied Predictive Modeling. New York: Springer
Řezáč, M. and Řezáč, F. (2011) ‘How to measure the quality of credit scoring models’. Czech Journal of Economics and Finance, 61(2011), pp. 486 – 507
Xia, Y., Liu, C., Li, Y. and Lie, N. (2017) ‘A boosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring’. Expert Systems with Applications, 78 (2017), pp. 225-241