1. Trang chủ
  2. » Luận Văn - Báo Cáo

Analyzing Financial Metrics For Predicting Default Probability In Small And Medium Enterprises.pdf

82 1 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Analyzing Financial Metrics For Predicting Default Probability In Small And Medium Enterprises
Tác giả Hoang Anh Duc
Người hướng dẫn Ph.D. Nguyen Minh Nhat
Trường học Banking University of Ho Chi Minh City
Chuyên ngành Finance & Banking
Thể loại Graduation Thesis
Năm xuất bản 2024
Thành phố Ho Chi Minh City
Định dạng
Số trang 82
Dung lượng 1,65 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Cấu trúc

  • CHAPTER 1: INTRODUCTION (16)
    • 1.1. The Urgency of The Research (16)
    • 1.2. Research Objectives (20)
    • 1.3. Research Questions (20)
    • 1.4. Research Subjects (21)
    • 1.5. Research Methods (21)
    • 1.6. Expected Contributions (22)
    • 1.7. The Structure of the Research (23)
  • CHAPTER 2 LITERATURE REVIEW (25)
    • 2.1. Small And Medium Enterprises (SMEs) (25)
    • 2.2. Probability Of Default (PD) (27)
    • 2.3. Financial Indicators (29)
    • 2.4. Overview of probability of default models (31)
      • 2.4.1. Probability of default models (31)
      • 2.4.2. The difference between Logistic Regression model, Decision Trees model, Gradient (34)
    • 2.5. Related Studies (35)
      • 2.5.1. Related studies in Vietnam (35)
      • 2.5.2. The other related studies (37)
  • CHAPTER 3: DATA AND METHODOLOGY OF RESEARCH (41)
    • 3.1. Theoretical framework (41)
    • 3.2. Data collection and processing (42)
    • 3.3. Selection of input variables in the default prediction model (44)
    • 3.4. Models for Estimating the Likelihood of Default (48)
      • 3.4.1. Logistic Regression model (48)
      • 3.4.2. Decision Trees and Gradient Boosting model (48)
      • 3.4.3. Artificial Neural Network model (50)
    • 3.5. Evaluation Criteria for Models Predicting Default (51)
      • 3.5.1. Confusion Matrix (51)
      • 3.5.2. F1 Score (53)
  • CHAPTER 4: EMPERICAL RESULTS (54)
    • 4.2. Comparison of Prediction Capabilities of the Models (56)
    • 4.3. Analysis Results of Variable Importance (60)
  • CHAPTER 5: CONCLUSION AND RECOMMENDATION (64)
    • 5.1. Applying the Model to Forecast Default Probability of SME Customers in Vietnamese (65)
      • 5.1.1. Identifying Potential Customers (65)
      • 5.1.2. Credit Policy Guidance Based on Model Outcomes (65)
      • 5.1.3. Model as a Source of Information for Credit Policy (66)
    • 5.2. Potential Application of the Model in Vietnamese Credit Rating Agencies (CRAs) (67)
    • 5.3. Limitation of the topic and future research direction (69)
      • 5.3.1. Limitation (69)
      • 5.3.2. Future research direction (69)

Nội dung

To date, there has been limited published research in Vietnam on selecting models to predict enterprises' default probability based on financial indicators.. Thus, the thesis centers on

INTRODUCTION

The Urgency of The Research

The COVID-19 pandemic has had a profound impact on the global economy, leading to a wave of bankruptcies among large businesses worldwide Euler Hermes (7/2020) forecasts that the rate of bankruptcies will increase to 35% between 2019 and

2021 Among the world's economic powers, the US is expected to suffer heavy losses, with the number of insolvent businesses forecast to increase by 57% by 2021 compared to the rate in 2019, before the outbreak of the pandemic Similarly, Brazil, the UK, and Spain are expected to experience a 45%, 43%, and 41% increase in bankruptcy cases, respectively, while China, where the epidemic started, is forecast to see a 40% growth in bankruptcies It is estimated that by the end of 2021, all regions worldwide will experience double-digit increases in their default rates, with the strongest increase expected to occur in North America (+56% year over year 2019), followed by Central and Eastern Europe (+34%), Latin America (+33%), Western Europe (+32%), and Asia (+31%) Recent statistics from the United States Courts show that personal and business bankruptcy filings rose 10% in the twelve-month period ending June 30, 2023, compared with the previous year These numbers reflect the mounting economic challenges faced by households and businesses, as debt loads continue to expand due to rising prices of goods and services, inflation, and the cost of borrowing

The COVID-19 pandemic has had a serious impact on the Vietnamese economy, which is highly integrated with the global economy The pandemic has disrupted the operations of production, supply, and circulation of commerce, aviation, tourism, labor, and employment, leading to a significant number of business closures In the first 10 months of 2020, 41.8 thousand enterprises temporarily suspended their business, up 58.7% over the same period in 2019, while 13.5 thousand enterprises completed dissolution procedures, up 0.1% The pandemic's impact on the Vietnamese economy has been significant, with GDP growth slowing from 8% in 2022 to 3.7% in the first half of 2023 The World Bank forecasts moderate growth of 4.7% in 2023, gradually accelerating to 5.5% in 2024 and 6.0% in 2025 The Vietnamese government has implemented various measures to mitigate the pandemic's impact on the economy, including fiscal stimulus packages, tax relief, and low-interest loans However, the pandemic's impact on the economy is expected to persist, with the labor market tightening, industrial output declining, and agricultural value chains being disrupted

Amidst the profound impact of the Covid-19 epidemic on the economy and the rising incidence of business insolvency, the internal credit rating system holds significant significance within commercial banks It serves as a pivotal tool for evaluating the credit risk associated with customers, aiding the bank in formulating credit decisions, and contributing to overall risk management In the Vietnamese context, there is a growing acknowledgment among commercial banks regarding the crucial role of this system in their credit operations and risk management practices This recognition is particularly noteworthy as Vietnamese commercial banks strive to align with the standards outlined in Basel II

Within the current landscape, the exigency of the research topic is underscored by distinct facets:

Firstly, prevailing credit rating models exhibit inherent limitations, engendering considerable debates and inconsistencies regarding their reliability This dilemma complicates the selection of an appropriate credit rating model for the accurate forecasting of business default probabilities (Huseyin & Bora, 2009) Noteworthy research by Aysegul Iscanoglu (2005) and Hayden & Daniel (2010) elucidates the diversity of models within the credit rating domain, including discriminant analysis, Logit model (logistic regression), decision tree model, artificial neural networks (ANN), Probit regression, each with its inherent merits and drawbacks Extensive analyses have been conducted on these models, with Platt (1991) advocating for the Logit model in testing and selecting financial variables, emphasizing the superiority of industry-average financial variables over those of individual firms in predicting business bankruptcy Lawrence (1992) utilizes the Logit model to forecast default probabilities of collateralized loans, while Altman (1968) employs differential analysis models to derive a linear function of financial and market variables for distinguishing between insolvent and solvent enterprises

Secondly, a persistent focus in the research on default prediction pertains to the identification of financial indicators influencing ranking outcomes In the early stages, specifically during 1926 – 1936, researchers predominantly relied on fundamental financial ratios for ranking, complemented by additional metrics such as equity/total net revenue by Ramser & Foster (1931) and the equity/fixed assets ratio proposed by Fitzpatrick (1932) Subsequently, in the subsequent period, Altman (1968) introduced a differentiation analysis model, incorporating five financial indices to predict business insolvency: equity/book value of debts, business net income/total assets, operating income/total assets, profit after tax/total assets, and working capital/total assets Deakin

(1972), utilizing a differentiation analysis model as well, expanded the scope by selecting

14 financial variables, encompassing metrics like cash/short-term debt, real cash flow/total debt, cash/net sales, and others The evolution of research has unearthed additional financial indicators influencing credit rating outcomes Blum (1974) incorporated variables such as the market rate of return, liquidity ratio for quick settlement, high liquidity assets/inventories, cash flow/total liabilities, the book value of assets/total liabilities, the downward trend of profits, and the downward trend of high liquidity assets/inventory Furthermore, Back, Laitinen, Sere & Wesel (1996) introduced an extensive set of 31 diverse indices in their research

Thirdly, the contemporary credit ranking methodology employed by banks in Vietnam remains predominantly subjective and qualitative, relying heavily on the assessment and experiential insights of credit officers directly engaged in customer management—a method commonly characterized as the expert method Consequently, the existing approach serves more as a supportive tool for credit-granting decision- making rather than a foundational basis for such decisions The absence of a highly reliable scientific framework for forecasting enterprise bankruptcy diminishes the efficacy of the current credit ranking method Remarkably, the scholarly discourse on the selection of models for predicting the probability of enterprise default based on financial indicators is notably limited within the academic literature in Vietnam

Fourthly, the governmental initiatives in constructing a legal framework for the credit rating sector aim to enhance information transparency, providing crucial support for banks in credit risk management This regulatory development also serves the broader objectives of fostering capital mobilization through the stock and bond markets, as well as safeguarding the rights and interests of investors The research and identification of apt rating models hold significant potential to advance credit rating activities within Vietnam Decisively, Decree No 88/2014/NĐ-CP, promulgated on September 26, 2014, delineates provisions for credit rating services, stipulating conditions for the establishment and operation of credit rating enterprises within the Vietnamese context Additionally, Prime Ministerial Decision No 507/QĐ-TTg, dated April 17, 2015, outlines the development plan for credit rating services until 2020 and envisions its trajectory up to 2030 This strategic roadmap underscores the mandatory rating requirement for corporate bonds issuance, effective from the year 2020

The discernible emphasis on selecting a model for predicting the probability of corporate default based on judiciously chosen financial indicators aligns with comprehensive measures advocated by the Basel Committee, particularly in the context of managing credit risk within Vietnamese commercial banks, as articulated in the Basel

II guidelines (2004) This strategic approach advocates for early-stage customer screening, thereby facilitating credit risk control and mitigating the prospect of default for banks

Henceforth, this thesis concentrates on the inquiry entitled "Analyzing financial metrics for predicting default probability in Small and Medium Enterprises" The primary objective is to furnish commercial banks with a comprehensive theoretical framework and empirical substantiation pertaining to the judicious selection of a fitting model for predicting business bankruptcy The overarching aim is to augment the efficacy of credit risk management within the banking sector, thereby laying the groundwork for enhanced decision-making processes in the future.

Research Objectives

The primary goal of this study is to develop a robust predictive model capable of accurately assessing the default probability of Vietnamese small and medium-sized enterprises (SMEs) By leveraging financial data from 2020 to 2022, the research aims to identify key financial indicators that significantly influence default risk

To achieve this objective, the study will employ a comparative analysis of various predictive modeling techniques By constructing and evaluating multiple models, the research seeks to identify the most effective approach for predicting SME default probability Ultimately, the findings will contribute to enhancing the risk management capabilities of commercial banks by providing a more accurate and reliable tool for assessing customer creditworthiness.

Research Questions

To attain the aforementioned objectives, the research has formulated the following research inquiries:

(i) What is the impact of financial indicators on the probability of default for Small and Medium Enterprises (SMEs)?

(ii) Which Probability of Default (PD) model demonstrates optimal efficacy in forecasting the default probability of SMEs?

Research Subjects

The focus of this thesis centers on evaluating the probability of default among Small and Medium Enterprises (SMEs) clientele within Vietnamese commercial banks The selected SMEs meet either of the following criteria: (i) possessing a total capital not exceeding 100 billion VND, or (ii) registering a total revenue for the preceding year not surpassing 300 billion VND

The study's scope encompasses the collection of financial indicators extracted from the financial statements of small and medium enterprises operating within

Vietnamese commercial banks throughout the period spanning from 2020 to 2022.

Research Methods

The thesis adopts a multifaceted research methodology that integrates both qualitative and quantitative approaches In the qualitative domain, the study engages in an exploration of the perspectives, perceptions, and evaluations held by commercial banks concerning the credit rating of Small and Medium Enterprises (SMEs) customers within the Vietnamese banking sector This qualitative phase also involves an in-depth examination of the factors influencing credit rating outcomes for SMEs in commercial banks in Vietnam, with subsequent scale development to facilitate the ensuing quantitative research

On the quantitative front, the study involves the redefinition of influential financial factors and the quantification of each factor's impact on the credit rating outcomes of SMEs at commercial banks in Vietnam To construct credit rating models, regression methods, specifically logit, decision trees, gradient boosting and artificial neural network techniques, are employed

In addition to the qualitative and quantitative methods, the thesis incorporates various methodological approaches throughout its execution The descriptive statistical method is applied to organize data based on relevant characteristics requiring descriptive analysis Comparative analysis is employed to draw conclusions through a comparative examination of models and practical applications Furthermore, the synthetic analysis method is utilized to synthesize and analyze pertinent data across the entire research process.

Expected Contributions

The research findings of this thesis carry significant scientific and practical implications, discernible across several crucial dimensions:

The exploration of fundamental and systematic background theories related to default probability prediction models, along with a comprehensive analysis of criteria for model selection, contributes to a nuanced understanding of the existing literature This systematic review reveals gaps in previous studies regarding the identification of the most suitable model for forecasting the default probability of small and medium enterprises within Vietnamese commercial banks based on financial indicators Such insights serve as a foundational basis for future researchers seeking to delve deeper into this research domain

Moreover, leveraging the identified research problems and outcomes, the study advocates for the adoption of a well-suited credit rating model capable of predicting the probability of default for Small and Medium Enterprises (SMEs) at Vietnamese commercial banks Grounded in financial indicators, this proposed model is positioned to significantly enhance the effectiveness of credit risk control measures implemented by commercial banks in Vietnam in the foreseeable future.

The Structure of the Research

The thesis is structured into five chapters, with the introduction and conclusion serving as bookends to the central content

In Chapter 1, the introductory section delineates the timeliness of the chosen topic, articulates the research issue, outlines research objectives, poses research questions, defines the research object and scope, expounds on research methods, underscores the topic's contribution, and provides a structural overview of the thesis, thereby furnishing readers with a comprehensive understanding of the entire study

Chapter 2 delves into the Theoretical Basis and Related Research, presenting foundational theories and background concepts pertaining to credit ratings, enterprise default probability, and methodologies for measuring and forecasting these elements The chapter also evaluates outcomes from prior studies that have been disseminated, aiming to elucidate the exigency of the chosen topic and establish a foundational framework for proposing research models and scrutinizing research results in subsequent chapters

Chapter 3, titled Model and Method of Research, meticulously delineates the content of the research model, specifically the proposed default probability prediction model It provides a detailed exposition of the collected data and the research methods employed, setting the stage for the subsequent presentation of research results in the following chapter The proposed default prediction model encompasses the Logistic Regression model, Decision Trees model, Gradient Boosting model and Artificial Neural Network model

Chapter 4, Research Results, critically analyzes the outcomes derived from default prediction models It employs indicators calculated from the confusion matrix, such as Accuracy, Sensitivity, Specificity, Precision, and F1-Score, to conduct a comprehensive comparison and evaluation of each model's efficacy in predicting default probabilities

Finally, Chapter 5, Conclusion and Recommendation, consolidates the achieved results of the thesis It proposes actionable solutions aimed at enhancing commercial banks' capacity to predict the probability of default for small and medium enterprises The chapter further recommends strategic adjustments in credit-granting activities to optimize efficiency, reduce credit risks, and ensure capital safety Additionally, the thesis offers implications for corporate governance policies to mitigate the risk of bankruptcy The chapter concludes by acknowledging limitations, highlighting outstanding issues, and suggesting future research directions.

LITERATURE REVIEW

Small And Medium Enterprises (SMEs)

Small and Medium Enterprises (SMEs) assume a crucial role in the majority of economies, particularly in developing nations These enterprises constitute the predominant business entities globally, making substantial contributions to job creation and the overall progress of the global economy They account for approximately 90% of businesses and over 50% of worldwide employment In emerging economies, formal SMEs alone contribute up to 40% of the national income (GDP), with these figures substantially increasing when informal SMEs are taken into account Projections indicate a need for around 600 million jobs by 2030 to accommodate the expanding global workforce, underscoring the heightened significance of SME development, a priority for numerous governments worldwide In emerging markets, the bulk of formal employment opportunities emanate from SMEs, which are responsible for seven out of every ten jobs Despite their pivotal role, access to finance emerges as a critical impediment to the growth of SMEs, standing as the second most frequently cited obstacle hindering their expansion in emerging markets and developing nations

Micro, small, and medium enterprises, commonly referred to as SMEs, are businesses characterized by their compact size concerning capital, labor, or turnover These enterprises can be classified into three groups based on their magnitude: micro enterprises, small enterprises, and medium enterprises According to the criteria established by the World Bank Group, a micro enterprise is one with fewer than 10 employees, a small enterprise comprises 10 to less than 200 employees, and a capital of

20 billion or less, while medium enterprises have 200 to 300 employees with a capital ranging from 20 to 100 billion

Across different countries, specific criteria are adopted to delineate SMEs based on their unique contexts In Vietnam, as outlined in Article 6 of the Government's Decree

No 39/2018/ND-CP, dated March 11, 2018, the stipulations are as follows:

1 Microenterprises in agriculture, forestry, fisheries, industry, and construction have an average annual number of employees participating in social insurance not exceeding 10 people, and total annual turnover or capital not surpassing 3 billion VND

In the trade and service sector, a microenterprise is characterized by an average annual number of employees participating in social insurance not exceeding 10 people, with total annual revenue not exceeding 10 billion VND or total capital not surpassing 3 billion VND

2 Small enterprises in agriculture, forestry, fisheries, and industry and construction have an average number of employees participating in social insurance not exceeding 100 people per year, with total annual turnover not exceeding 50 billion VND or total capital not surpassing 20 billion VND, excluding those meeting the criteria for microenterprises as defined in Clause 1

In the trade and service sector, a small enterprise is identified by an average annual number of employees participating in social insurance not exceeding 50 people, and total annual revenue not exceeding 100 billion VND or total capital not surpassing

50 billion VND, but not meeting the criteria for a microenterprise as outlined in Clause

3 Medium enterprises in agriculture, forestry, fisheries, and industry and construction have an average annual number of employees participating in social insurance not exceeding 200, with total annual turnover not exceeding 200 billion VND or total capital not surpassing 100 billion VND, excluding those meeting the criteria for small and microenterprises as defined in Clauses 1 and 2

A medium-sized enterprise in the trade and service sector is characterized by an average annual number of employees participating in social insurance not exceeding 100, and the total revenue of the year not exceeding 300 billion VND or total capital not surpassing 100 billion VND, but not meeting the criteria for a microenterprise or small enterprise as outlined in Clauses 1 and 2.

Probability Of Default (PD)

The likelihood of default constitutes a crucial element applied in various credit risk analyses and risk management endeavors In accordance with Basel II, it serves as a pivotal parameter employed in determining the economic capital necessary for credit institutions to absorb risks effectively

As defined by the Office of the Comptroller of the Currency, the probability of default refers to the risk associated with a borrower's inability or unwillingness to fully or punctually repay a debt Default risk is derived from an assessment of the borrower's capacity to meet the debt obligations outlined in the contractual terms Probability of default (PD) is often linked to financial indicators such as insufficient cash flow to cover expenses, declining revenue or operating margins, high leverage, reduced liquidity, or an inadequate ability to execute successful business plans Beyond these quantifiable factors, an evaluation of the borrower's willingness to repay the debt is also imperative in determining the probability of default

As elucidated by Tysk (2010), PD represents a quantitative evaluation of the likelihood of an obligor facing bankruptcy within a specified timeframe, typically a year

This assessment typically outlines the probability of a company failing to meet its loan or banking liabilities Given that insolvency often arises from business losses or financial shortages, it can lead to a company's bankruptcy To assess the probability of default, a bank may employ scoring systems based on a company's repayment capability, thereby enhancing the precision of lending decisions

PD stands out as a highly valuable metric for categorizing borrowers Regardless of whether banks employ standard methodologies or more advanced approaches, they are obliged to furnish regulators with an internal PD estimate relative to the borrower, aligned with the assigned score The ranking outcome derived from PD is deemed relatively precise, leveraging the computation based on the firm's authentic financial ratios, thereby offering a practical reflection of the business's current state The meticulous consideration of PD can significantly mitigate credit risk

In the pursuit of predicting the probability of default, institutions construct credit rating systems A credit rating encapsulates an assessment of an enterprise's capacity and inclination to make timely payments for a specific debt throughout its duration The ranking system elucidated in this document is symbolized by three letters (ABC), ranging from AAA denoting the highest stability to C representing the highest risk level Consequently, the adoption of credit ratings has witnessed widespread popularity, manifesting versatility in terms of purposes and subjects of evaluation This evolution has consequently altered societal perspectives and viewpoints

According to Michael K Ong (2003), credit rating constitutes a systematic process of evaluating and categorizing credit levels in relation to varying degrees of risk Each rating serves as a clear representation, succinctly capturing the solvency of a rated company Simultaneously, credit rating involves the utilization of available and current information to project future outcomes

In accordance with Standard & Poor's perspective, credit rating involves the assessment of a party's creditworthiness to fulfill financial obligations in the future, grounded in existing factors and the evaluator's standpoint Essentially, credit rating is the articulation of opinions regarding credit risk This entails expressing an opinion on the capability and willingness of issuers (Rating issuers), such as corporations or governmental entities, to meet financial obligations sufficiently and punctually Credit ratings may extend to the credit quality of individual debts (Rating issues), such as corporate or government bonds, encompassing the associated risk assessment that may lead to adverse consequences Fitch Ratings asserts that, in their assessment, credit rating is an evaluation of an entity's ability to meet debt obligations, including interest rates, concessional dividends, insurance, or other liabilities of the rated entity Fitch's credit rating methodology integrates both financial and non-financial factors, thereby indicating that the assessment index also encompasses the prospective profitability of the organization

In essence, credit rating is a systematic procedure that involves assessing and categorizing credit levels in alignment with various risk levels It serves as a mechanism for gauging the quality and solvency of the credit rating object while also projecting future outcomes, conveyed through a system of ranking symbols Consequently, credit ratings furnish investors with vital information about the financial standing and risk profile of financial institutions, facilitating informed investment decisions The evaluation process encompasses both financial and non-financial factors Financial factors entail crucial financial ratios derived from the analysis of financial statements In contrast, non-financial factors, being inherently challenging to quantify, include elements such as political influences, business lines, and the macroeconomic environment.

Financial Indicators

Financial indicators refer to relationships derived from a company's financial data, serving as tools for comparison Three commonly cited examples of financial ratios include return on investment (ROI), return on assets (ROA), and debt to equity These metrics result from dividing an account balance or financial measure by another factor, typically found in financial statements such as the balance sheet, income statement, cash flow statement, and/or equity statement They offer a valuable means to assess progress against internal objectives, benchmark against competitors, or compare with the broader industry

Financial ratios, calculated by dividing one financial or business figure by another (e.g., total sales divided by the number of employees), allow business owners to scrutinize the relationships between different financial factors They offer a straightforward and effective method for evaluating business health Financial indicators are instrumental in identifying trends in the early stages, making them valuable for bankers, investors, and business analysts They provide insights into both positive and negative aspects of a business, complementing financial statements, which alone may not offer detailed business information

There exist four widely recognized categories of financial indicators, each possessing distinctive characteristics:

Figure 2.1: 4 popular groups of financial indicators

Source: https://vietnamcredit.com.vn/news/company-financial-ratios-the-key-to- strengthen-your-decision-part-1_13314

Overview of probability of default models

Numerous global studies have been conducted on models designed to forecast the likelihood of enterprises defaulting Each of these models exhibits variations in input variables, strengths, and weaknesses, as outlined in the following paragraphs

Linear regression analysis is a statistical technique employed to explore and model linear relationships between variables Introduced in 1970 by Orgler, this model was used to establish a link between customer characteristics and default likelihood While widely used and straightforward to implement, linear regression offers interpretable results even with limited data However, its limitations include potential predictions outside the 0-1 range, restricted extrapolation capabilities, and sensitivity to outliers

Discriminant analysis, introduced by Fisher in 1936, is a statistical technique designed to classify subjects into predefined groups based on their characteristics In the context of credit risk, it classifies borrowers as either 'good' or 'bad' customers This method involves constructing a linear discriminant function using quantitative variables derived from financial statements The function seeks to maximize the separation between the two groups while minimizing intra-group variability While similar to linear regression, discriminant analysis treats the group membership as fixed and the characteristics as random, inverting the relationship between variables Despite its simplicity and widespread use, discriminant analysis assumes normally distributed variables, which may not always hold true in real-world applications Additionally, the model's coefficients lack direct interpretability, limiting its explanatory power

Logit and Probit models are statistical techniques employed to analyze the relationship between a binary dependent variable (default or non-default) and a set of independent variables These models, as evidenced by studies conducted by Olso (1980), Gilbert (1990), and Hayden (2010), directly estimate the probability of a business defaulting based on the observed data Renowned for their ease of implementation and interpretable results, these models offer probabilistic outputs aiding decision-making Additionally, the ranking of results and variables can be reliably verified However, their application is contingent upon adhering to underlying distributional assumptions, which may not always hold in real-world scenarios Furthermore, their effectiveness might be compromised when dealing with data irregularities or businesses exhibiting unique financial characteristics Instances of underestimation or overestimation of default probabilities can also arise, necessitating careful model validation and refinement

The decision tree model, pioneered by Breiman in 1980, is a non-parametric method widely employed in classifying and predicting enterprise defaults It operates by partitioning data into increasingly homogeneous subsets based on specific attributes, forming a tree-like structure Each internal node represents a decision based on an attribute, while leaf nodes signify the final classification This model excels in handling non-linear relationships, providing intuitive and interpretable results, and calculating bankruptcy probabilities However, its susceptibility to overfitting with excessive layers and the computationally intensive nature of model construction pose limitations Moreover, the lack of statistical tests for model stability is a drawback

Introduced by Ho in 1995, the Random Forest model is an ensemble learning method designed to predict default probabilities It operates by constructing multiple decision trees, each trained on a random subset of the data and features The model's output is determined by aggregating the predictions of these individual trees, often through majority voting or averaging While Random Forests generally exhibit superior predictive accuracy compared to single decision trees, their inherent complexity makes it challenging to interpret the underlying decision-making process, often referred to as the "black box" problem Despite this limitation, Random Forests are robust to missing data and less prone to overfitting due to their ensemble nature, making them a popular choice for various predictive modeling tasks

Originating in 1943, neural network theory gained prominence in default prediction studies following Shade's research in 1990 Unlike traditional parametric models, ANNs do not estimate parameters but instead mimic the human brain's information processing by employing interconnected nodes While capable of handling complex patterns and various data types, including qualitative information, ANNs demand substantial computational resources and high-quality data Moreover, their black-box nature, requiring expert interpretation, and limited adoption in Vietnam pose challenges for widespread implementation Despite these limitations, ANNs offer potential advantages in overcoming issues related to data correlation and producing clear results when appropriately applied

Logistic regression is a statistical method employed to model the probability of a binary outcome, such as default or non-default, based on a set of independent variables

It is widely used in various fields due to its interpretability and efficiency While capable of providing probability estimates, logistic regression is inherently limited to binary outcomes Additionally, the model assumes a linear relationship between the log-odds of the outcome and the independent variables, which may not always hold true in real- world scenarios Despite these limitations, logistic regression remains a foundational tool in predictive modeling and credit risk assessment

Gradient boosting is an ensemble learning technique that sequentially builds an additive model by combining multiple weak learners, typically decision trees It operates by fitting new models to correct the errors made by previous models This iterative process results in a powerful predictive model capable of handling complex patterns in data While gradient boosting often achieves high accuracy, its interpretability can be limited due to the complex nature of the ensemble Additionally, careful tuning of hyperparameters is required to optimize performance

2.4.2 The difference between Logistic Regression model, Decision Trees model, Gradient Boosting model and Artificial Neural Networks model

Logistic regression, decision trees, gradient boosting, and artificial neural networks represent a spectrum of modeling techniques for predicting binary outcomes Logistic regression, a linear method, offers interpretability through its coefficients but is constrained by its linearity assumption Decision trees, on the other hand, excel at capturing non-linear relationships within data but are prone to overfitting Gradient boosting, an ensemble method, mitigates the overfitting issue by combining multiple decision trees, enhancing predictive accuracy Artificial neural networks, inspired by the human brain, can model complex patterns and non-linear relationships but often sacrifice interpretability for predictive power

The choice of model is contingent upon the specific characteristics of the dataset, the desired level of model complexity, and the trade-off between interpretability and predictive accuracy For instance, logistic regression is suitable for simpler problems where the relationship between predictors and the outcome is linear and easily interpretable Decision trees are advantageous when dealing with complex interactions and non-linearity, but their performance can be unstable Gradient boosting offers a balance between interpretability and predictive power, making it a popular choice in many applications Artificial neural networks are best suited for highly complex problems with large datasets where predictive accuracy is the primary concern

Ultimately, the selection of the optimal model often involves experimentation and comparison of different approaches to determine the best fit for a specific problem.

Related Studies

Until now, numerous studies have been conducted concerning the prediction of enterprise default probability and bankruptcy risk However, the research outcomes exhibit inconsistency, attributed to various factors, including variations in the subject and scope of the studies, as well as differences in the models and methodologies employed

In Vietnam, numerous studies on this topic, mostly published in mass media, have explored factors influencing the risk of enterprise bankruptcy and predicting the probability of bankruptcy in corporate valuation Notable works by authors such as Hay Sinh (2003), Le Dat Chi & Le Tuan Anh (2012), Le Nguyen Son Vu (2013), Vo Hong Duc & Nguyen Dinh Thien (2013), a group of authors from the University of Economics Ho Chi Minh City (2013), and Nguyen Minh Ha & Nguyen Ba Huong

(2016) have contributed to this body of research However, the predominant focus in these published works has been on descriptive statistical analysis using tabular databases, with limited attention to the development of business default prediction models

To address this gap, the current research emphasizes the need for empirical evidence derived from regression analysis to establish a more robust foundation for proposing accurate models predicting the probability of bankruptcy Specifically, Hay Sinh's (2003) research delves into estimating the probability of bankruptcy in business valuation The author highlights the significance of this probability as a financial parameter directly impacting enterprise value The assessment is conducted through two methods: the cash flow approach, where the probability of bankruptcy is not independently estimated but reflected in the discount rate, and the adjusted APV approach, which considers the probability of bankruptcy as an independent parameter The research aims to establish methods for estimating the probability of bankruptcy in enterprises, particularly contributing to the wider application of the adjusted APV method in business valuation practices in Vietnam

Furthermore, Le Nguyen Son Vu's (2013) research focused on investment decisions and bankruptcy risks of listed companies on Vietnam's stock market The empirical study provided evidence on the influence of financial ratios on investment decisions and bankruptcy risks of companies listed on the Vietnamese stock market from 2003 to 2012 The dataset encompassed 737 companies across various sectors The findings indicated that three factors—negative net income in the last two years, short-term liquidity, and liabilities/total assets—exhibited a positive correlation with the Oscore bankruptcy forecast index Simultaneously, an inverse relationship was observed between the rate of return/total assets and the rate of net income growth with the Oscore bankruptcy prediction index, and all these relationships were statistically significant On the other hand, the remaining four factors, including firm size, total liabilities/total assets, working capital/total assets, and operating funds/total liabilities, also influenced the Oscore bankruptcy forecast index but were not statistically significant

In a more recent study, Nguyen Thi Tuyet Lan (2019) investigated factors affecting the bankruptcy risk of listed companies in the construction industry in

Vietnam The study employed the logit model with five independent variables, including total liabilities/total assets, working capital/total assets, short-term solvency, rate of return/total assets, and net income growth The overall sample included 109 listed companies in the construction industry in Vietnam on the Hanoi Stock Exchange (HNX) and Ho Chi Minh Stock Exchange (HOSE) during the period from 2005 to

2017 The results revealed that total liabilities/total assets had a positive correlation with the bankruptcy risk of listed companies in the construction industry in Vietnam, while the rate of return/total assets had an opposite effect

Globally, numerous studies have been conducted on forecasting the probability of default and the risk of bankruptcy for enterprises One notable study is Edward I A.'s

(1968) research, which employed financial indexes and a polynomial model to predict the likelihood of business bankruptcy The study utilized a sample of 66 enterprises, categorized into two groups: 33 bankrupt enterprises as per Chapter X of the US Bankruptcy Law between 1946 and 1965, and 33 non-bankrupt businesses that continued operating in 1966 The analysis included 22 independent variables of financial indicators, grouped into categories such as liquidity index, profitability ratios, financial leverage ratios, debt solvency ratios, and performance ratios

The results from Edward I A.'s polynomial analysis model demonstrated a relatively high level of accuracy, forecasting 94% of the original survey sample However, it's important to note a limitation of the study, as it exclusively focused on large manufacturing enterprises, determined by the size of their assets

Moving forward, there is a study on business failure prediction by Evridiki Neophytou, Andreas Charitou & Chris Charalambous (2000) that utilized the logit analysis technique to develop a model classifying failed industrial enterprises in the UK The dataset encompassed 51 pairs of failed and non-failed businesses in the UK spanning from 1988 to 1997 The forecasting model, based on three financial variables (profitability, operating cash flow, and financial leverage), was designed three years before the firm's failure The results of the model effectively explained 83% of the probability of business failure one year prior However, the model's assumptions regarding the probability distribution and the absence of multicollinearity may limit its applicability

In a study conducted by Ravi & Pramodh (2008), a neural artificial network model was employed to predict defaults for Turkish and Spanish banks The model considered

9 financial factors for Turkish banks and 12 financial factors for Spanish banks The constructed model achieved an accuracy level of 96.6% for the Spanish banks' dataset and 100% for the Turkish banks' dataset However, due to its complexity and lack of transparency in structure, the model's applicability is limited, as it is challenging to follow and not under the modeller's control

In the realm of financial indexes and multi-factor analysis, a study conducted by Ben Chin-Fook Yap, David Gun-Fie Yong & Wai-Ching Poon (2010) focused on predicting the failure of Malaysian businesses The research group developed a Multiple

Discriminant Analysis (MDA) model to enhance the prediction capability for Malaysian businesses that underwent restructuring, experiencing diverse financial and operational conditions across different business activities The authors employed 16 financial indicator variables to analyze a sample of 64 enterprises

The research findings identified 7 financially indexed variables that exhibited statistically significant forecasts, achieving high accuracy rates ranging from 88% to 94% for each business before facing failure These variables include total assets/total liabilities, cash flow/total long-term liabilities, total long-term liabilities/total assets, working capital/total assets, retained earnings/total assets, pre-tax and interest/sales, and net income/sales Nevertheless, this method has drawbacks associated with the basic assumptions in the model, such as assumptions of normal distribution and equal variance, which can be violated This violation reduces the reliability of the model and limits its applicability

A review of both domestic and foreign studies reveals that financial institutions have at their disposal a variety of credit rating models to forecast the probability of enterprise default These predictive models encompass polynomial models, logit models, probit models, and artificial neural network models, among others Furthermore, these rating models utilize diverse inputs or financial indicators to predict business bankruptcy, with common indicators including short-term solvency, return on total assets, and total liabilities to total assets ratio

DATA AND METHODOLOGY OF RESEARCH

Theoretical framework

In this research paper, the author employs parametric models to forecast the default probability of SMEs in Vietnam during the period 2020 - 2022 The model-building process unfolds as follows:

Step 1: Collect and process data Data is sourced from audited annual financial statements of approximately 400 SMEs across nine different industries in Vietnam spanning from 2020 to 2022

Step 2: Select the input variables of the model In forecasting enterprises' default probability, the author chooses 13 input variables derived from financial statements These independent variables are categorized into four groups of financial indicators: liquidity index group, debt use index group, profitability index group, and a group of performance indicators for the business

Step 3: Utilized Python and its accompanying packages, including NumPy, Pandas,

Scikit-learn, TensorFlow, and Seaborn, for data processing, analysis, and model building The chosen parametric models encompass the Logistic Regression model, Decision Trees model, Gradient Boosting model and Artificial Neural Network model

Step 4: Utilize the Confusion matrix and F1 - Score to evaluate the prediction results of each model Based on this evaluation, an appropriate credit rating model will be selected, demonstrating the capability to effectively predict the probability of default for customers

Detailed explanations of these research steps will be expounded upon in the subsequent sections of Chapter 3.

Data collection and processing

The dataset for this study was derived from the annual financial statements of approximately 400 Vietnamese companies operating across nine distinct sectors The data was sourced from the period between 2020 and 2022 To ensure data reliability and accuracy, only audited financial statements were included in the analysis The sample comprised 31 consumer goods companies, 35 petroleum businesses, 39 automotive firms, 40 construction and installation companies, 43 pharmaceutical and medical equipment companies, 45 textile and garment companies, 47 fisheries enterprises, 54 iron and steel companies, and 66 agricultural businesses This diverse representation of sectors allowed for a comprehensive analysis of financial performance across different industries

Table 3.1: Synthesize the number of businesses - business lines

5 Pharmaceutical Industry and Medical Equipment 43

9 Agricultural (rice, coffee, cashew, pepper, ) 66

Source: Statistics from the author

Crouhy, Galai & Mark (2001) have identified the weaknesses of numerous risk measurement models that rely on historical financial information generated under conditions that may not be applicable in the future or use data updated infrequently Hence, this research is conducted over the past three years, a period during which the economic situation and business performance have exhibited minimal changes

The companies are categorized into two groups: the bankruptcy group labeled as

1, and the non-bankrupt group labeled as 0 This classification is based on information extracted from the Balance Sheet (Equity), Income Statement (profit after tax), and cash flow statement (cash flow from operating operations) Companies are deemed bankrupt (marked as 1) if they meet one or more of the following conditions:

(ii) Net profit after tax and operating cash flow are negative for two consecutive years

(iv) Belonging to the group of Non-Performing Loans (groups 3, 4, and 5) This assumption aligns with the theory of Crouhy, Galai & Mark, asserting that credit risk analysis relies on various attributes of the borrower, encompassing finance, management, income, cash flow, asset quality, and liquidity of the borrower company

Within the dataset, any missing, inaccurate, or inconsequential values will be addressed based on the following principles:

Firstly, missing values will be substituted with the mean of the corresponding group to which the borrower belongs For instance, if the missing value pertains to a bankrupt firm, it will be replaced by the average of the bankruptcy group

Secondly, infinitive values will be substituted with either the maximum or minimum values, depending on which value is more appropriate

Finally, excessively volatile values will be replaced with the 5% or 95% quantile of the financial indicators.

Selection of input variables in the default prediction model

Certain scholarly works advocate the utilization of financial indicators as informative resources for evaluating credit risk (Demerjian, 2007) Conversely, Smith & Warner (1979) highlighted that breaches of contractual conditions by customers could serve as signals regarding their capacity to repay debts to the bank Additionally, Dichev

& Skinner (2002) observed that committed conditions on financial ratios would influence the content included in credit contracts

Lundholm & Sloan (2004) have illustrated the significance of commitment conditions on financial ratios, emphasizing that " if the company starts to show signs of trouble, the bank can recover the debt or dispose of the assets before the company loses its solvency." Meanwhile, Beaver (1966) has provided empirical evidence suggesting that certain financial ratios offer notable statistical signals of impending business failures Beaver's attempt to predict a firm's default probability primarily concentrated on leverage and liquidity ratios, which might not offer sufficient financial information for precise forecasts

Furthermore, various credit rating agencies scrutinize financial ratios when assessing credit quality For instance, Standard & Poor's (2006) considered an enhancement in the debt-to-cash ratio (total debt/income) and solvency ratio in determining the credit rating for Staples, Inc Similarly, Moody (2006) remarked in the ratings for Limited Brands, Inc that a decrease or stabilization in the insurance range and debt-to-cash ratio (total debt/income) would influence future ranking adjustments

Demerjian (2007) outlined five commitment conditions on financial ratios commonly utilized in credit contracts as a foundation for assessing borrowers' credit risk These conditions include:

(i) Minimum coverage (income/expense related to periodic debt)

(ii) Maximum debt to cash flows (total debt/income)

(iii) Minimum net value (assets - liabilities = shareholder equity)

(iv) Maximum leverage (total debt/total assets)

(v) Minimum solvency (short-term assets/short-term liabilities)

Moody's and Standard and Poor's incorporate key financial metrics into their rating process, encompassing the Debt/Asset Index, Interest payment ratio from profit before tax and loan interest, Business outlook (growth in cash flows or returns on assets), Dividends and other payments, Business risk (fluctuations in cash flow or asset value), and Liquidity of assets

Moody's report (2001) introduced the RiskCalc model, a quantitative credit rating model designed to assess firms in the high-end market in Japan The model incorporates seven indicators, including profitability, financial leverage, liquidity, principal repayment, interest repayment, size, and performance indicators of the enterprise (see Appendix - Table 01)

Expanding on Hayden's study, Engelmann & Rauhmeier (2010) introduced 14 selected financial ratios categorized into nine risk groups: financial leverage, liquidity, performance index, cost control, asset use efficiency, profitability, firm size, growth, and debt growth (see Appendix - Table 02)

In summary, the independent variables typically selected for the default prediction model fall into the following four groups of financial indicators:

Group of indicators on liquidity:

This category of indices assesses the enterprise's capacity to convert assets into cash for meeting short-term obligations, indicating its short-term solvency It typically includes measures such as current solvency and quick solvency A decrease in these coefficients is indicative of an increased risk of business bankruptcy

Group of indicators on financial leverage:

This set of metrics gauges the extent to which the enterprise utilizes debt, encompassing ratios like debt-to-total assets, debt-to-equity ratio, interest coverage, and the capability to settle long-term debts Elevated debt ratios coupled with reduced solvency signal an elevated risk of the business facing default

Group of indicators on profitability:

This category of metrics evaluates the profitability of a business, utilizing indicators such as gross profit margin, pre-tax income as a percentage of net revenue, pre-tax income as a percentage of total assets, and pre-tax income as a percentage of equity Profitability is considered a crucial factor in predicting a company's ability to meet its debt obligations, as a loss-making company may deplete its equity sources, posing challenges in repaying debt

Group of indicators on performance:

The performance indicator assesses the efficient utilization of business assets, emphasizing inventories, receivables, and overall corporate assets A lower index signifies ineffective asset use, indicating higher credit risk for the enterprise Drawing insights from prior research, the author chose 13 financial indicators as independent variables for the credit rating models in the study The ensuing table outlines the calculation methods for these variables and the anticipated signs in the default prediction model

Table 3.2: Independent variables in the probability default prediction model

X1 Gross profit/Net revenue Profitability -

X2 Income before tax/Net revenue Profitability -

X3 Income before tax/Total assets Profitability -

X4 Earnings before tax/Equity Profitability -

X5 Total liabilities/Total assets Financial leverage +

X6 Short-term assets/Short-term liabilities Liquidity -

Inventories)/Short-term Liabilities Liquidity -

X8 Profit before tax and interest/Interest Interest payment -

X9 Earnings before tax, interest and amortization/Long-term debt

X10 Cash and cash equivalents/Equity Liquidity -

X11 Average cost of goods sold/Inventory Performance +

X13 Total revenue/Total assets Performance -

Source: Statistics from the author

Models for Estimating the Likelihood of Default

Logistic regression, a widely used classification algorithm, was employed to model the relationship between independent variables and a binary outcome (default or no default) After applying the LogisticRegression class from the scikit-learn machine learning library, the model training process commenced with data preparation, including one-hot encoding and feature selection to facilitate efficient model training Hyperparameter tuning, involving the regularization parameter (C), penalty method, and solver type, was meticulously conducted to optimize the model's performance on the dataset Analyzing the coefficients obtained from the logistic regression model provided insights into the importance and influence of each feature on a company's likelihood of default

The performance of the logistic regression model was evaluated using metrics such as the Area Under the Receiver Operating Characteristic curve (AUC) and accuracy AUC measures the model's ability to distinguish between defaulters and non- defaulters Accuracy, comparing predicted labels with actual labels, determines the overall classification performance These metrics were calculated for both the training and testing sets, providing a comprehensive understanding of the model's performance

3.4.2 Decision Trees and Gradient Boosting model

Decision trees serve as a powerful quantitative analysis tool, employing a series of rules to partition a dataset into homogeneous segments based on attributes and predicted outcomes Through data mining, this algorithm develops classification rules represented as a tree structure, where each node represents a decision based on a specific attribute The result is the formation of highly homogeneous classification groups, reflecting the default risk of each group While this model excels in visualization and interpretation, it is prone to overfitting, reducing its predictive performance on unseen data To address this limitation, techniques like Random Forests and Gradient Boosting are applied to optimize the model and prevent overfitting, thereby improving accuracy and generalization

XGBoost, a robust gradient boosting algorithm, is known for its high performance and efficiency in various machine learning tasks In the analysis, XGBoost was implemented using the XGBClassifier class from the XGBoost library Prior to training the XGBoost model, the dataset underwent preprocessing steps, including encoding categorical variables and feature selection Hyperparameters such as the number of estimators, learning rate, maximum depth, and regularization parameters were carefully tuned to optimize the model's performance The XGBoost model was then trained on the prepared dataset During training, XGBoost iteratively constructs an ensemble of weak decision trees, optimizing a specific objective function to minimize loss By combining the predictions of multiple weak learners, the model improves its predictive power To gain insights into the relative importance of each feature in the XGBoost model, feature importance scores were visualized This visualization helped identify the most influential features in predicting borrower default The performance of the XGBoost model was evaluated using the same evaluation metrics as logistic regression, including AUC and accuracy These metrics were calculated for both the training and testing datasets, providing an assessment of the model's generalization ability

Artificial Neural Networks (ANNs) are renowned for their ability to process incomplete information and maintain robust performance even in the presence of noise or anomalies (Mijwel, 2018) Their architecture, often composed of at least two layers of interconnected neurons, mirrors the organizational structure of biological neural networks (Mountcastle, 1957) A typical ANN comprises an input layer, hidden layers, and an output layer The input layer receives and processes raw data, while hidden layers facilitate complex interactions between input and output neurons, enabling the network to make predictions or classifications The output layer represents the final result, which can be utilized for various tasks, such as predicting corporate default—a critical problem in financial analysis

The structure of an ANN is characterized by multiple layers of interconnected neurons, each layer containing a specific number of neurons and defined by an activation function The design process involves determining the number of layers, the number of neurons per layer, and the type of activation function, often guided by the dataset's characteristics, research objectives, and iterative experimentation Prior to training, the dataset undergoes rigorous preprocessing, including Min-Max normalization to standardize features and improve convergence during training The preprocessed dataset is then divided into training and testing sets to facilitate efficient model training and accurate performance evaluation

ANN training involves adjusting the weights and biases of neurons to minimize a loss function and enhance prediction accuracy Model performance is rigorously assessed using metrics such as loss and accuracy, calculated on both training and testing sets This comprehensive evaluation allows researchers to objectively assess the model's effectiveness in financial prediction and analysis, providing a foundation for further refinement to achieve optimal results.

Evaluation Criteria for Models Predicting Default

To assess the effectiveness of default probability prediction models, various techniques are employed, including the Confusion Matrix, Accuracy Ratio, Sensitivity Ratio, Specificity Ratio, and F1 Score

The Confusion Matrix is a method designed to evaluate the outcomes of classification problems by considering the accuracy and coverage of predictions for each specific category It includes four indicators for each classification category: True Positive (TP) signifies correct predictions; True Negative (TN) indicates correct predictions indirectly; False Positive (FP) represents false predictions; False Negative (FN) denotes deviations indirectly

In the context of evaluating the outcomes of enterprise default predictions, the Confusion Matrix is structured as shown in the table below

Actual Class Classes Non-default Default

Non-default True Negative (TN) False Positive (FP)

Default False Negative (FN) True Positive (TP)

Source: Statistics from the author

For each classification category, the indicators are interpreted as follows:

- True Positive (TP): The number of correct predictions This occurs when the model accurately predicts that a company will go bankrupt

- True Negative (TN): The number of correct predictions indirectly It indicates correct predictions when the model accurately forecasts that a business will not go bankrupt

- False Positive (FP): The number of false predictions This happens when the model predicts that a company is bankrupt, but, in reality, the company is financially sound This is akin to a type 1 error in the model

- False Negative (FN): The number of false predictions indirectly This occurs when the model predicts that a company will not go bankrupt, but, in fact, the company faces bankruptcy It signifies an incorrect decision not to identify a company facing bankruptcy, characterized as a type 2 error in the model

Hayden and Daniel's research indicates that the primary objective of the default prediction model is to analyze, assess, and rank borrowers, distinguishing between good and bad customers However, in situations where risks materialize, statistics reveal that False Negative, a type 2 error, leads to more significant losses for the bank than False Positive, type 1 error, due to the potential failure to recover capital from a customer group affected by a type 2 error

Among the four indicators mentioned above, researchers organize them into groups to assess the reliability of a model predicting the probability of default for businesses

Accuracy ratio: Accuracy reflects the model's capability to precisely differentiate between bankrupt and non-bankrupt companies The formula to calculate the accuracy of a predictive model is as follows:

Accuracy = (TN + TP)/(TN + FN + TP + FP)

Sensitivity ratio: Sensitivity measures the model's effectiveness in correctly identifying bankruptcies It is calculated as the proportion of companies that receive accurate bankruptcy predictions out of the total number of bankrupt companies The sensitivity level of the model is determined by the following formula:

Specificity ratio: Specificity evaluates the model's capability to accurately identify cases of companies that are not bankrupt It is calculated as the proportion of firms with correct non-bankruptcy forecasts out of the total number of non-bankrupt firms The formula for determining this ratio is as follows:

Precision ratio: Precision measures the number of companies correctly forecasted as bankrupt relative to the total number of companies forecasted as bankrupt This rate is determined through the following formula:

The F1-score is introduced to address the limitations of evaluating model reliability based on indicators constructed from the Confusion Matrix In the context of classifying companies as bankrupt or non-bankrupt, the dataset used for building and testing the model often exhibits an uneven distribution In such cases, the non-bankrupt category tends to dominate the dataset, leading to skewed and inaccurate forecasts for sensitivity and precision ratios The F1 score serves as an index that simultaneously assesses both sensitivity and precision ratios, providing a more reliable evaluation of the effectiveness of credit rating models

The F1 score, or F-measure, is calculated using the following formula:

F1 Score = 2 * (Precision*Sensitivity)/(Precision + Sensitivity)

A higher F1 score, closer to 1, indicates greater accuracy and effectiveness of the credit rating model.

EMPERICAL RESULTS

Comparison of Prediction Capabilities of the Models

The results presented in Table 4.3 not only illustrate the superiority of machine learning algorithms such as Gradient Boosting and Artificial Neural Networks (ANNs) over traditional Logistic models in bankruptcy risk prediction but also highlight their potential application in risk management and financial analysis The superiority of Gradient Boosting is demonstrated by its higher F1 Score, which reflects not only accurate prediction capabilities but also a balance between Recall (sensitivity) and Precision (accuracy) of the model This is crucial in applications where costs are associated with high misclassification rates

Table 4.4: Out-of-Sample Bankruptcy Prediction Results of the Models

Algorithm AUC Accuracy Precision Sensitivity Specificity F1 score

Among the two most comprehensive metrics, accuracy and F1 score, the most prominent method is ANN with an accuracy exceeding 93%, followed by the XGBoost method with an accuracy of nearly equal, reaching around 92.2% This result demonstrates the effectiveness of ensemble learning algorithms in practical bankruptcy risk prediction applications However, the analysis also shows that the Logistic Regression (LR) and Decision Tree algorithms perform worse, indicating that they are less suitable for bankruptcy risk assessment based on prediction ability At the same time, the prioritization of ANN and XGBoost for bankruptcy risk assessment is based not only on their high prediction accuracy but also on the flexibility and ability to handle complex data that these two methods offer

Regarding the Precision metric, ANN continues to demonstrate its superiority with a prediction accuracy rate exceeding 87.5% In addition, both XGBoost and Logistic Regression (LR) also record Precision values above 80%, indicating that LR and XGBoost have promising applicability in building bankruptcy prediction models However, methods based on the Decision Tree model appear to be unsuitable due to their weaker classification capabilities in this specific context Therefore, the application of ANN, XGBoost, and LR in bankruptcy risk prediction models results in high prediction performance, reflecting the ability to accurately distinguish between bankrupt and non- bankrupt cases

For the Sensitivity metric, Decision Tree methods and the XGBoost model both score around 60%, with LR showing significantly lower performance This implies that in the case of LR, the criteria for assessing bankruptcy risk are too cautious, leading to an inefficient minimization of bankruptcy risk However, ANN again demonstrates the best performance for this metric at over 66.6% This provides insight into the advantage of ANN in improving bankruptcy risk prediction capabilities, thereby contributing to the enhancement of credit decision quality The superiority of ANN over LR and other methods in handling complex and nonlinear data patterns is a key factor leading to this effectiveness

Both XGBoost and ANN are advanced machine learning algorithms developed to address complex problems by learning deeply from data Both algorithms are effective in handling nonlinear and highly complex data, allowing them to discover relationships between input and output variables without requiring fixed assumptions about the underlying model However, through detailed analysis, ANN demonstrates its superiority with an F1 score of 0.756757, surpassing XGBoost and notably far exceeding Logistic Regression, which only reached 0.448276 The superiority of the ANN algorithm is not only reflected in its accurate prediction capabilities but also in its ability to effectively balance the identification of true bankruptcy cases and the minimization of errors, thus asserting its superior position in the field of bankruptcy prediction compared to other methods This not only enhances the practical application value of machine learning in risk management but also demonstrates its strong potential in improving financial decision-making and optimizing bankruptcy risk management, contributing to the improvement of efficiency and accuracy in financial risk management at organizations

Figure 4.2: Prediction Results of the Models on the Confusion Matrix

Analysis of the Confusion Matrix of the forecasting methods illustrated in Figure 4.1 reveals that each model has its own strengths and weaknesses Specifically, the LR and XGBoost models performed well in predicting non-bankruptcy cases, with a total of

230 accurate predictions, surpassing the ANN and Decision Tree models, which had 222 and 220 accurate predictions, respectively However, the notable difference between the models lies in their ability to detect bankruptcy cases: the Logistic model performed the least effectively, identifying only 13 bankrupt companies on the validation dataset, while the ANN model excelled with the ability to predict 29 cases accurately, outperforming both the XGBoost and Decision Tree models, which had less effective results of 25 and

17 companies, respectively When comparing the performance of the models, the Decision Tree and ANN models generally performed better in identifying both types of cases, while Logistic Regression performed better in predicting non-bankruptcy cases For the XGBoost model, although it was not as accurate as the two aforementioned models, it had the advantage of being able to predict bankruptcy and non-bankruptcy cases at a relatively high level, while also being highly interpretable, making it suitable for applications that require transparency and understanding of how the model works These results further support the conclusion that the ability of ANN-based methods to detect at-risk customers is relatively high, making a significant contribution to improving risk management in the credit sector.

Analysis Results of Variable Importance

Figure 4.3: Estimated results of the models in PD estimation

Figure 4.2 presents the regression coefficients of the variables in the LR model The coefficient of variable X13 (Total revenue/Total assets) stands out with a very high value compared to the other variables, indicating that this variable has a particularly strong influence on the ability to predict bankruptcy in the model This also suggests that financial risk managers need to pay special attention to this variable when analyzing and assessing the creditworthiness of customers Variable X2 (Income before tax/Net revenue) also shows a significant regression coefficient, but not as strong as X13 (Total revenue/Total assets), indicating that it also plays a role in shaping the results, but not as large as variable X13 (Total revenue/Total assets)

The Decision Tree and XGBoost models provide a different perspective on the importance of the variables, with variable X3 (Income before tax/Total assets) being rated as the most important by both models This consistency between the two models further reinforces the assumption that variable X3 (Income before tax/Total assets) has a significant impact on the prediction results and should be seriously considered when building financial risk prediction models However, in the ANN model, variable X12 (Receivables/Average Revenue) stands out as the most important, indicating that ANN has a different approach and data evaluation, and is capable of detecting complex relationships that other models may not be able to capture

Comparing the models, it can be seen that each model provides a unique perspective on the influence of variables on the ability to predict bankruptcy The Decision Tree and XGBoost focus on X3 (Income before tax/Total assets), while ANN shows that X12 (Receivables/Average Revenue) is the most important variable This provides a multi-faceted approach to assessing and managing creditworthiness, requiring financial institutions to diversify their analytical methods to gain a completer and more accurate picture At the same time, this also poses a requirement for building prediction models to be flexible and accurately reflect the diversity of data as well as the interaction between input variables in the practical issue of risk management

Analyzing and identifying the importance of variables in a prediction model plays an essential role not only in focusing on core information when building the model but also in supporting the effective application of the model in practice By doing so, explaining the impact of each variable on the prediction results becomes more transparent and easier to understand, contributing to the optimization of business decisions and strategies based on a solid data foundation This not only enhances the reliability and accuracy of the model but also expands its application potential in developing risk management solutions, investment strategies, and evidence-based business decisions, thereby bringing practical benefits to organizations and businesses.

CONCLUSION AND RECOMMENDATION

Applying the Model to Forecast Default Probability of SME Customers in Vietnamese

This research successfully developed a model to predict the creditworthiness (default probability) of small and medium-sized enterprises (SMEs) seeking loans from Vietnamese commercial banks The model can be a valuable tool for banks to stabilize credit quality and minimize Non-Performing Loans

Customers with a strong credit rating (rated A or higher) who additionally demonstrate a good repayment capacity according to the model’s evaluation will likely have a low probability of defaulting on their loans This translates to a low credit risk associated with this customer group

The model goes beyond prediction; it serves as a comprehensive support system for commercial banks throughout the credit granting process It empowers banks to ensure credit quality, facilitate efficient and secure loan expansion, and promote sustainable growth Ultimately, the model can assist banks in selecting and maintaining a high-quality customer base, tailoring marketing strategies towards low-risk customers, and fostering a network of reputable clients with a strong track record of debt repayment

5.1.2 Credit Policy Guidance Based on Model Outcomes

This study proposes a credit scoring model that empowers commercial banks to implement risk-based lending practices for small and medium-sized enterprises (SMEs) The model categorizes SMEs based on their likelihood of default, allowing banks to tailor credit policies and loan terms to each risk group

For SMEs with a low probability of default, the model recommends extending credit with favorable terms These terms may include preferential interest rates, relaxed collateral requirements (e.g., waiving the need for property as security), and fewer restrictions on financial ratios that the business must maintain Conversely, SMEs with an average probability of default may receive loans under standard bank terms, with potential interest rate reductions for exceeding minimum collateral requirements

The model prescribes a more cautious approach for SMEs with a high probability of default These businesses may be ineligible for new loans and face gradual reductions in existing credit lines Additionally, banks may impose higher interest rates and stricter financial ratio requirements to mitigate the risk of non-payment

Beyond individual risk assessment, the model informs broader credit policy development The research identifies four key financial ratios significantly impacting a company’s default risk These ratios include profitability metrics, inventory turnover ratio, and sales to asset ratio Businesses exhibiting higher profitability, lower debt levels, and efficient inventory management are generally considered less risky borrowers

In conclusion, this credit scoring model equips banks with data-driven insights to minimize loan defaults Banks can prioritize lending to financially strong businesses while carefully scrutinizing companies with high debt burdens, poor inventory management practices, or belonging to industries with elevated bankruptcy rates Furthermore, the model’s insights provide valuable information on borrower performance and industry trends, informing the development of future credit policies

5.1.3 Model as a Source of Information for Credit Policy

Accurate prediction of a customer’s ability to repay loans is critical for banks

This capability serves two key functions: firstly, it assists in identifying potential customers who are likely to be good borrowers Secondly, it allows banks to continuously monitor and reassess the creditworthiness of existing customers after loans have been granted By leveraging the results of a credit scoring model, commercial banks can proactively detect potential solvency issues (indicated by a high probability of default) in their customer base This early identification allows for timely intervention and the implementation of measures to address these problems Consequently, the risk of capital loss for the bank is significantly reduced

In the Vietnamese context, Decision No 493/2005/QD-NHNN, issued on April

22 nd , 2005, governs debt classification, provisioning establishment, and the utilization of provisions to manage credit risks within banking operations conducted by credit institutions The majority of Vietnamese commercial banks currently rely on the results of customer debt classification to determine appropriate provisioning ratios Therefore, the ability to accurately forecast a customer’s repayment capability (default probability) becomes highly valuable This enhanced predictive power translates to a more efficient provisioning process, ultimately leading to the establishment of a more effective credit risk reserve fund.

Potential Application of the Model in Vietnamese Credit Rating Agencies (CRAs)

This chapter explores the potential application of the developed credit scoring model within the Vietnamese credit rating industry Credit Rating Agencies (CRAs) are private companies that assess the creditworthiness of borrowers, typically businesses or governments, and assign them credit ratings These ratings communicate the borrower’s likelihood of defaulting on their debts and are crucial for investors navigating the bond market

In Vietnam, the credit rating industry is relatively young, with the first licenses being issued in 2017 Decree 88/2014/ND-CP, issued in 2014, regulates credit rating services, and the Credit Rating Service Development Plan outlines a vision for industry growth by 2030 This plan anticipates a maximum of five licensed CRAs by 2030, emphasizing the importance of fostering a competitive and robust credit rating sector

The proposed credit scoring model offers valuable insights for Vietnamese CRAs The model leverages five key financial ratios – income before tax to total assets, total liabilities to total assets, income before tax, interest, and amortization to long-term debt, cost of goods sold to average inventory, and total sales to total assets – to predict a borrower’s default probability These ratios provide a comprehensive picture of a borrower’s financial health and their ability to service their debts

CRAs in Vietnam can utilize this model in several ways Firstly, by incorporating these financial ratios into their credit rating methodologies, CRAs can produce more accurate and nuanced credit assessments This enhanced accuracy can lead to more reliable credit ratings, ultimately benefiting both borrowers and investors Secondly, the model can be used to classify borrowers into risk groups based on their predicted default probability This allows CRAs to streamline their credit rating processes and tailor their reports to specific borrower types Finally, the model’s insights can be used to provide credit risk management recommendations to borrowers, particularly small and medium- sized enterprises (SMEs) By highlighting areas for financial improvement, CRAs can empower SMEs to strengthen their creditworthiness and access better loan terms

The research presented in this thesis contributes to the ongoing development of credit risk assessment methodologies in Vietnam The high accuracy (93%) achieved by the ANN model suggests its potential as a valuable tool for Vietnamese CRAs It is important to acknowledge that the success of CRAs hinges on their ability to access and utilize accurate borrower data The data employed in this study can serve as a starting point for CRAs to build robust and reliable credit risk assessment models.

Limitation of the topic and future research direction

The research acknowledges several limitations The study analyzes only 400 enterprises across nine industries over three years, which may be a relatively small sample size Additionally, the model utilizes only 13 input variables, potentially limiting its ability to comprehensively assess credit risk While audited financial statements were used to ensure data quality, the authors recognize that auditing standards in Vietnam might not be as rigorous as those in developed countries The way financial statements are prepared in Vietnam for various purposes can also introduce inconsistencies in data quality Finally, the model focuses solely on financial data, neglecting the potential influence of non-financial factors like management quality, industry trends, and regulatory changes These non-financial factors can provide valuable insights for creditworthiness assessment, suggesting that the current model might not fully capture the comprehensive approach used by Vietnamese commercial banks By addressing these limitations through further research and data collection efforts, the model’s generalizability, accuracy, and applicability in the Vietnamese context could be significantly improved

This thesis identifies several promising areas for future research to further refine and strengthen the model First, expanding the data set to include a larger sample of businesses, ideally exceeding 1,000, would provide a more robust foundation for the model’s findings Additionally, collecting financial statements on a quarterly basis rather than annually could potentially enhance the model’s accuracy by capturing more granular financial trends Second, if the data size permits, disaggregating the data by industry sector could be beneficial This disaggregation would allow for the development of sector-specific models that more precisely capture the unique risk profiles associated with different business activities This level of granularity would significantly enhance the practical applicability of the model across diverse industries By pursuing these avenues for further research, the model’s generalizability, accuracy, and industry- specific relevance could be significantly improved

1 Altman, E I (1968) Financial ratios, discriminant analysis and the prediction of corporate bankruptcy The journal of finance, 23(4), 189-209

2 Beaver, W H (1966) Financial ratios as predictors of failure Journal of accounting research, 71-111

3 Breiman, L., Friedman, J., Stone, C J., & Olshen, R A (1984) Classification and Regression Trees Wadsworth International Group [Note: Breiman,

4 Dombolena, I., & Khoury, S (1980) Ratio Stability and Corporate Failure The journal of Finance, 35(5), 1017-1026

5 Fernández-Delgado, M., Cernadas, E., Barro, S., & Amorim, D (2014) Do we Need Hundreds of Classifiers to Solve Real Worlds Classification problems? Journal of Machine Learning Research 15(1), 3133-3181

6 Fisher, R A (1936) The use of multiple measurements in taxonomic problems Annals of eugenics, 7(2), 179-188

7 Fitzpatrick, P J (1932) A comparison of the ratios of successful industrial enterprises with those of failed companies Accountant, 89, 589-602

8 Hay, S (2003) Estimating the probability of bankruptcy in enterprise valuation Journal of Development & Integration, 8(18)

9 Hayden, E (2010) Estimation of a Rating Model for Corporate Exposures The Basel II Risk Parameters, 13-24

10 Hayden, E., & Porath, D (2010) Statistical Methods to Develop Rating Models The Basel II Risk Parameters, 1-12

11 Ho, T K (1995) Random Decision Forests, Proceedings of the 3 rd International Conference on Document Analysis and Recognition, Montreal, 278-282

12 Le, D C., & Le, T A (2012) Combining CvaR model and Merton-KMV model to measure default risk: Empirical evidence in Vietnam, Journal of Development

13 Le, N S V (2013) Investment decisions and bankruptcy risks of companies listed on the Vietnamese stock market [Master’s thesis, University of Economics Ho Chi Minh City]

14 Myers, J H., & Forgy, E W (1963) The development of numerical credit evaluation system Journal of the American Statistical association, 58(303), 799-806

15 Neophytou, E., Charitou, A., & Charalambous, C (2000) Predicting Corporate Failure: Empirical Evidence for the UK The journal of Accounting Literature

16 Nguyen, T T L (2019) Factors affecting bankruptcy risk of listed companies in the construction industry in Vietnam Journal of Banking Science & Training, No 205

17 Odom, M D., & Sharda, R (1993) A Neural Network for Bankruptcy Prediction International Joint Conference on Neural Networks, 163-168

18 Ong, M (2005) Internal Credit Risk Models Capital Allocation and Performance Measurement Risk Books

19 Hosmer, D W., Jovanovic, B., & Lemeshow, S (1989) Best subsets logistic regression Biometrics, 45(3), 1265-1270 [Note: reference was relocated based on publication date]

20 Patcha, A., & Park, J M (2009) An overview of outlier detection techniques: Existing solutions and latest technological trends Computer Networks,

21 Platt, H D (1991) Predicting corporate financial distress: Reflections on choice-based sample bias Journal of Economics and Finance, 26(1-2), 189-209

22 Rajan, R G., & Zingales, L (1995) What do we know about capital structure? Some evidence from international data The journal of Finance, 50(5), 1421-

23 Serrasqueiro, Z., & Nunes, P M (2010) Non-linear relationships between growth opportunities and debt: Evidence from quoted Portuguese companies Journal of Business Research, 63(8), 870-878

24 Stehman, S V (1997) Selecting and interpreting measures of thematic classification accuracy Remote sensing of Environment, 62(1), 77-89

25 Sudhakar, M., & Reddy, C V K (2016) Two step credit risk assessment model for retail bank loan applications using Decision Tree data mining technique International Journal of Advanced Research in Computer Engineering & Technology (IJARCET), 5(3), 705-718

26 Svetnik, V., Liaw, A., Tong, C., Culberson, J C., Sheridan, R P., & Feuston,

B P (2003) Random forest: a classification and regression tool for compound classification and QSAR modeling Journal of chemical information and computer sciences, 43(6), 1947-1958

27 Tham dinh Tin dung Blog (2018) Overview of Credit Rating Agency (CRA) in the world [in Vietnamese] Retrieved from https://thamdinhtindung.com/so- luoc-ve-cac-to-chuc-xep-hang-tin-nhiem-credit-rating-agency-cra-tren-the-gioi/

28 Debt.org (n.d.) Bankruptcy Statistics (2024, April 24) Retrieved from https://www.debt.org/bankruptcy/statistics/

29 United States Courts (2023, July 31) Bankruptcy Filings Rise 10 Percent Retrieved from https://www.uscourts.gov/news/2023/07/31/bankruptcy-filings-rise- 10-percent

30 United States Courts (2023, October 26) Bankruptcy Filings Rise 13 Percent Retrieved from https://www.uscourts.gov/news/2023/10/26/bankruptcy- filings-rise-13-percent

31 World Bank (2023, August 10) Vietnam’s Economic Growth Slows Due to Global Headwinds and Internal Constraints Retrieved from https://www.worldbank.org/en/news/press-release/2023/08/10/vietnam-s-economic- growth-slows-due-to-global-headwinds-and-internal- constraints#:~:text=The%20report%20shows%20that%20Vietnam%27s,2024%20and

32 World Bank Group (2023, March) Taking Stock: Vietnam Economic Update Retrieved from https://www.worldbank.org/en/country/vietnam

/publication/taking-stock- 59ietnam-economic-update-march-2023

33 Asian Development Bank (ADB) (n.d.) Viet Nam’s Economy to Slow Down Due to Prolonged COVID-19, but ADB is Bullish on Economic Growth in the Medium-Longer Term Retrieved from https://www.adb.org/news/viet-nam-economy- slow-down-covid-19-adb-bullish-economic-growth-medium-longer-term

APPENDIX Table 1: Python codes used for estimating PD import os import glob from pprint import pprint from datetime import datetime import numpy as np

!pip install pandas==1.5.3 import pandas as pd import matplotlib.pyplot as plt import seaborn as sns from sklearn.linear_model import LogisticRegression from sklearn.tree import DecisionTreeRegressor from sklearn.tree import DecisionTreeClassifier, export_graphviz from sklearn.ensemble import GradientBoostingClassifier, RandomForestClassifier, GradientBoostingRegressor from xgboost import XGBClassifier, XGBRegressor from sklearn.metrics import ( confusion_matrix, f1_score, accuracy_score, classification_report, recall_score, precision_score, roc_auc_score,

) from sklearn.model_selection import GridSearchCV from sklearn.model_selection import Kfold, train_test_split import tensorflow as tf from tensorflow import keras from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense np.random.seed(0) df = pd.read_csv(“/content/dataset1.csv”)

X = df[[f”X_{i}” for I in range(1, 15)]] y = df[“default”]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=0)

# Define ANN model def create_ann_model(input_dim): model = Sequential([

Dense(64, activation=’relu’, input_dim=input_dim),

Dense(1, activation=’sigmoid’) # For binary classification ]) model.compile(loss=’binary_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’]) return model

‘Logistic Regression’: LogisticRegression(random_stateB), ‘Decision Tree’: DecisionTreeClassifier(random_stateB),

‘Artificial Neural Network’: create_ann_model(X_train.shape[1]) }

# Create a DataFrame to store the results results_df = pd.DataFrame(columns=['Model', 'Dataset', 'Accuracy', 'Precision', 'Sensitivity', 'Specificity', 'F1 Score', 'AUC'])

# Train and evaluate each model for model_name, model in models.items(): print(datetime.now(), model_name)

# Train the model model.fit(X_train, y_train)

# Make predictions on the in-sample data y_pred_in_sample = model.predict(X_train) y_pred_prob_in_sample = model.predict_proba(X_train)[:, 1]

# Make predictions on the out-sample data y_pred_out_sample = model.predict(X_test) y_pred_prob_out_sample = model.predict_proba(X_test)[:, 1]

# Evaluate performance on in-sample data accuracy_in_sample = accuracy_score(y_train, y_pred_in_sample) precision_in_sample = precision_score(y_train, y_pred_in_sample) recall_in_sample = recall_score(y_train, y_pred_in_sample) f1_in_sample = f1_score(y_train, y_pred_in_sample) auc_in_sample = roc_auc_score(y_train, y_pred_prob_in_sample) cm_in_sample = confusion_matrix(y_train, y_pred_in_sample)

TN_in_sample = cm_in_sample[0, 0]

FP_in_sample = cm_in_sample[0, 1] specificity_in_sample = TN_in_sample / (TN_in_sample + FP_in_sample)

# Evaluate performance on out-sample data accuracy_out_sample = accuracy_score(y_test, y_pred_out_sample) precision_out_sample = precision_score(y_test, y_pred_out_sample) recall_out_sample = recall_score(y_test, y_pred_out_sample) f1_out_sample = f1_score(y_test, y_pred_out_sample) auc_out_sample = roc_auc_score(y_test, y_pred_prob_out_sample) cm_out_sample = confusion_matrix(y_test, y_pred_out_sample)

TN_out_sample = cm_out_sample[0, 0]

FP_out_sample = cm_out_sample[0, 1] specificity_out_sample = TN_out_sample / (TN_out_sample + FP_out_sample)

# Append results to the DataFrame results_df = results_df.append({

'Accuracy': accuracy_out_sample, 'Precision': precision_out_sample, 'Sensitivity': recall_out_sample,

'Specificity': specificity_out_sample, 'F1 Score': f1_out_sample,

# Display the results DataFrame print(results_df) results_df.query('Dataset == "Out-Sample"')

Results of the models after running the python code

Requirement already satisfied: pandas==1.5.3 in /usr/local/lib/python3.10/dist- packages (1.5.3)

Requirement already satisfied: python-dateutil>=2.8.1 in

/usr/local/lib/python3.10/dist-packages (from pandas==1.5.3) (2.8.2)

Requirement already satisfied: pytz> 20.1 in /usr/local/lib/python3.10/dist- packages (from pandas==1.5.3) (2023.4)

Requirement already satisfied: numpy>=1.21.0 in /usr/local/lib/python3.10/dist- packages (from pandas==1.5.3) (1.25.2)

Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.1->pandas==1.5.3) (1.16.0)

/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_logistic.py:458: ConvergenceWarning: lbfgs failed to converge (status=1):

STOP: TOTAL NO of ITERATIONS REACHED LIMIT

Increase the number of iterations (max_iter) or scale the data as shown in: https://scikit-learn.org/stable/modules/preprocessing.html

Please also refer to the documentation for alternative solver options: https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression n_iter_i = _check_optimize_result(

:108: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version Use pandas.concat instead results_df = results_df.append({

:108: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version Use pandas.concat instead results_df = results_df.append({

:108: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version Use pandas.concat instead results_df = results_df.append({

:108: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version Use pandas.concat instead.

Ngày đăng: 02/10/2024, 15:33

Nguồn tham khảo

Tài liệu tham khảo Loại Chi tiết
27. Tham dinh Tin dung Blog (2018). Overview of Credit Rating Agency (CRA) in the world [in Vietnamese]. Retrieved from https://thamdinhtindung.com/so-luoc-ve-cac-to-chuc-xep-hang-tin-nhiem-credit-rating-agency-cra-tren-the-gioi/Websites Link
28. Debt.org (n.d.). Bankruptcy Statistics (2024, April 24). Retrieved from https://www.debt.org/bankruptcy/statistics/ Link
29. United States Courts. (2023, July 31). Bankruptcy Filings Rise 10 Percent. Retrieved from https://www.uscourts.gov/news/2023/07/31/bankruptcy-filings-rise-10-percent Link
30. United States Courts. (2023, October 26). Bankruptcy Filings Rise 13 Percent. Retrieved from https://www.uscourts.gov/news/2023/10/26/bankruptcy-filings-rise-13-percent Link
32. World Bank Group. (2023, March). Taking Stock: Vietnam Economic Update. Retrieved from https://www.worldbank.org/en/country/vietnam/publication/taking-stock- 59ietnam-economic-update-march-2023 Link
33. Asian Development Bank (ADB). (n.d.). Viet Nam’s Economy to Slow Down Due to Prolonged COVID-19, but ADB is Bullish on Economic Growth in the Medium-Longer Term. Retrieved from https://www.adb.org/news/viet-nam-economy-slow-down-covid-19-adb-bullish-economic-growth-medium-longer-term Link
1. Altman, E. I. (1968). Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. The journal of finance, 23(4), 189-209 Khác
2. Beaver, W. H. (1966). Financial ratios as predictors of failure. Journal of accounting research, 71-111 Khác
3. Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984). Classification and Regression Trees. Wadsworth International Group. [Note: Breiman, 1980, was not found] Khác
4. Dombolena, I., & Khoury, S. (1980). Ratio Stability and Corporate Failure. The journal of Finance, 35(5), 1017-1026 Khác
5. Fernández-Delgado, M., Cernadas, E., Barro, S., & Amorim, D. (2014). Do we Need Hundreds of Classifiers to Solve Real Worlds Classification problems?.Journal of Machine Learning Research 15(1), 3133-3181 Khác
6. Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of eugenics, 7(2), 179-188 Khác
7. Fitzpatrick, P. J. (1932). A comparison of the ratios of successful industrial enterprises with those of failed companies. Accountant, 89, 589-602 Khác
8. Hay, S. (2003). Estimating the probability of bankruptcy in enterprise valuation. Journal of Development & Integration, 8(18) Khác
9. Hayden, E. (2010). Estimation of a Rating Model for Corporate Exposures. The Basel II Risk Parameters, 13-24 Khác
10. Hayden, E., & Porath, D. (2010). Statistical Methods to Develop Rating Models. The Basel II Risk Parameters, 1-12 Khác
11. Ho, T. K. (1995). Random Decision Forests, Proceedings of the 3 rd International Conference on Document Analysis and Recognition, Montreal, 278-282 Khác
12. Le, D. C., & Le, T. A. (2012). Combining CvaR model and Merton-KMV model to measure default risk: Empirical evidence in Vietnam, Journal of Development& Integration, 5(15) Khác
13. Le, N. S. V. (2013). Investment decisions and bankruptcy risks of companies listed on the Vietnamese stock market [Master’s thesis, University of Economics Ho Chi Minh City] Khác
14. Myers, J. H., & Forgy, E. W. (1963). The development of numerical credit evaluation system. Journal of the American Statistical association, 58(303), 799-806 Khác

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

w