1. Trang chủ
  2. » Giáo Dục - Đào Tạo

Default predictors in retail banking – an empirical study in vietnam

82 71 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 82
Dung lượng 1,76 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Key words: retail banking, credit scoring, default, risk, logistic regression, probability... As a result, to measure the level of credit risk as a whole for such individual loan segment

Trang 1

VIETNAM – NETHERLANDS PROGRAMME FOR M.A IN DEVELOPMENT ECONOMICS

DEFAULT PREDICTORS IN RETAIL BANKING –

AN EMPIRICAL STUDY IN VIETNAM

By

NGUYEN BAO QUOC

MASTER OF ARTS IN DEVELOPMENT ECONOMICS

HO CHI MINH CITY, SEPTEMBER 2013

Trang 2

UNIVERSITY OF ECONOMICS INSTITUTE OF SOCIAL STUDIES

VIETNAM - NETHERLANDS PROGRAMME FOR M.A IN DEVELOPMENT ECONOMICS

DEFAULT PREDICTORS IN RETAIL BANKING –

AN EMPIRICAL STUDY IN VIETNAM

A thesis submitted in partial fulfillment of the requirements for the degree of

MASTER OF ARTS IN DEVELOPMENT ECONOMICS

Trang 3

DECLARATION

I declare that "Default Predictors in Retail Banking – An Empirical Study in Vietnam"

is my own work; it has not been submitted to any degree at other universities

I confirm that I have made by effort and applied all knowledge for finishing this thesis in the best way

Ho Chi Minh City, September 2013

NGUYEN BAO QUOC

Trang 4

ACKNOWLEDGEMENTS

First and foremost I would like to offer my gratitude to my supervisor, Dr Le Cong Tru, for invaluable comments, remarks and engagement through the learning process of the thesis Then I have Mr Le Duc Anh to thank for introducing me to the topic I am also much obliged

to Associate Prof Dr Nguyen Trong Hoai, Dr Pham Khanh Nam and Dr Luca Tasciotti for helpful remarks on my TRD as well as keeping me on the right track For the availability of the dataset, I am thankful to MDE Tran Thu Trang from the Head Office of BIDV Last but not least, I am deeply indebted to my parents, my dearly beloved wife, my brothers and sisters for all the understanding and spiritual assistance I will wholeheartedly be grateful forever for your love

Trang 5

ABSTRACT

Due to intense competition, over-lending and economic turmoil, banking system in Vietnam

is suffering a huge amount of non-performing loans Given the considerable growth of retail banking market, an exploration of risk predictors becomes crucial more than ever This paper investigates key factors that influence loan repayment performance among individual customers The survey covers a representative sample of personal loans from one of the largest Vietnamese commercial banks A logistic regression technique is employed to evaluate the relationship between delinquency and borrower characteristics and loan features The regression results reveal that borrower characteristics, e.g borrowing history, bank-account holding and education level, rather than loan factors, such as purposes, duration and credit limit, have stronger effects on the default outcome This suggests that bankers apply appropriate adjustments to borrower characteristics to minimize default risk

Key words: retail banking, credit scoring, default, risk, logistic regression, probability

Trang 6

TABLE OF CONTENTS

DECLARATION i

ACKNOWLEDGEMENTS ii

ABSTRACT iii

TABLE OF CONTENTS iv

LIST OF TABLES vii

LIST OF FIGURES vii

LIST OF ABBREVIATIONS viii

Chapter 1 INTRODUCTION 1

1.1 Background 1

1.2 Problem statement 2

1.3 Research objectives 3

1.4 Research questions 4

1.5 Justification of the study 4

1.6 Scope of the study 4

1.7 Organization of the study 5

Chapter 2 LITERATURE REVIEW 6

2.1 History of credit scoring 6

2.2 Concepts of credit scoring 7

2.3 Reviews of economic theories 10

2.4 Reviews of empirical studies 12

2.4.1 Default predictors in markets for credit cards and instant loans 12

2.4.2 Default predictors in markets for automobiles, mortgages and real property construction 14

2.4.3 Default predictors in markets for individual loans 16

2.5 Chapter summary 17

2.5.1 Empirical literature summary 17

2.5.2 Problems and limitations of previous studies 20

2.5.3 Conceptual framework 21

Trang 7

Chapter 3 DATA AND RESEARCH METHODOLOGY 22

3.1 Data collection 22

3.2 Variables measurements 23

3.2.1 Response variable 23

3.2.2 Explanatory variables 24

3.2.2.1 Borrower characteristics 24

3.2.2.2 Loan characteristics 27

3.3 Research methodology 28

3.3.1 Descriptive analysis 28

3.3.2 Econometric model 29

3.3.2.1 Methodologies for CSM 29

3.3.2.2 Logistic regression 30

3.4 Validation 32

3.4.1 Overall evaluations 33

3.4.2 Statistical tests of individual predictors 34

3.4.3 Goodness-of-fit statistics 34

3.4.3.1 Pseudo R-squared statistics 34

3.4.3.1.1 Cox and Snell's R2 34

3.4.3.1.2 Nagelkerke's R2 35

3.4.3.2 Hosmer and Lemeshow test 35

3.4.4 Validations of predicted probabilities 36

3.4.4.1 Classification table 36

3.4.4.2 Area under the ROC curve 37

3.5 Analytical framework 38

3.6 Chapter summary 38

Chapter 4 DATA ANALYSIS AND RESULTS 39

4.1 Descriptive statistics 39

4.1.1 Personal tastes for loans by ages 41

4.1.2 Discretionary incomes and default 42

4.1.3 Nexus between loan amount and loan outcomes 43

4.1.4 Loan duration and loan outcomes 43

4.1.5 Collateral value and loan outcome 44

4.1.6 Differences in variables between defaulted and non-defaulted loans 45

4.1.7 Correlation matrix among independent variables 45

Trang 8

4.2 Information value 46

4.3 Empirical results 48

4.3.1 Model estimation 48

4.3.2 Assumption verification 50

4.3.3 Model validation 51

4.3.3.1 Overall evaluations and statistical tests of individual predictors 51

4.3.3.2 Goodness-of-fit statistics 52

4.3.3.3 Validations of predicted probabilities 52

4.3.3.3.1 Classification table 52

4.3.3.3.2 Receiver operating characteristic and area under the ROC curve 53

4.3.4 Result interpretation 53

4.3.4.1 Borrower characteristics 54

4.3.4.2 Loan characteristics 56

4.4 Chapter summary 57

Chapter 5 CONCLUSION AND POLICY IMPLICATIONS 58

5.1 Conclusion 58

5.2 Policy implications 59

5.3 Limitations and further studies 62

REFERENCES 63

APPENDIX 67

Trang 9

LIST OF TABLES

Table 2.1 Credit scoring vs Credit rating 9

Table 2.2 Summary of variables 18

Table 3.1 Overview of variables 31

Table 3.2 Predictive accuracy of CSMs 36

Table 4.1 Variables initially considered for the CSM 40

Table 4.2 Loan type statistics 41

Table 4.3 Loan duration 43

Table 4.4 Differences in variables between the loan outcomes 45

Table 4.5 Correlation coefficients among continuous independent variables 46

Table 4.6 Information values for explanatory variables 47

Table 4.7 Regression results 49

Table 4.8 Classification table 52

Table 4.9 Performance of the models 53

LIST OF FIGURES Figure 2.1 Process of credit scoring 8

Figure 2.2 Conceptual framework 21

Figure 3.1 ROC Curve and AUC 37

Figure 3.2 Steps in binary logistic regression 38

Figure 4.1 Gender and loan sample 39

Figure 4.2 Average loan size vs age and purposes 42

Figure 4.3 Default frequencies among different groups of discretionary incomes 42

Figure 4.4 Default frequencies among different groups of loan amounts 43

Figure 4.5 Default frequencies among different groups of loan duration 44

Figure 4.6 Default frequencies among different ratios of collateral-to-loan 44

Figure 4.7 ROC curves and AUC 53

Trang 10

LIST OF ABBREVIATIONS

BIDV Joint Stock Commercial Bank for Investment and Development of Vietnam BIS Bank for International Settlements

CAPM Capital Asset Pricing Model

CSM Credit Scoring Model

ECOA Equal Credit Opportunity Act

NPL Non-performing Loan

SBV State Bank of Vietnam

VND Vietnam dong

Trang 11

Chapter 1 INTRODUCTION

This chapter introduces the thesis topic and identifies the main issues which will be covered in the following sections The background and motivation to the study will come first Then the research objectives, research questions and scope will be introduced The next will be the main contribution of the study and the thesis structure is to be briefly displayed at the end of the chapter

1.1 Background

In spite of the wide variety of banking businesses, providing loans for corporate customers and individuals constitutes the majority of proceeds for commercial banks as well

as other credit institutions As information asymmetries prevail, lenders are trading with a risk

of borrowers falling in default (Stiglitz & Weiss, 1981) However, asymmetric information is not the only threat since social factors along with effects of business cycles may also impact upon the delinquency (Allen, DeLong, & Saunders, 2004) To advocate lending activities, measurement of credit risk has been taken seriously and therefore has made dramatic progress over two past decades (Altman & Saunders, 1997) These two scholars point out several forces that give impulse to credit-risk measurement They involve: (1) a worldwide increase

in cases of bankruptcies, (2) disintermediation trend by the largest borrowers and highest quality, (3) marginal competitiveness on loans, (4) a decreasing value of property (and collateral as a result), and (5) a sharp rise in off-balance sheet instruments After Bank for International Settlements (BIS) has launched the revised framework Basel II, banks are encouraged to promote their approaches on credit-risk measurement (Claessens, Krahnen, & Lang, 2005) and vendors start to offer improved models to banks for calculating the regulatory capital requirements

Together with the rapid increase in bank loans for corporates and institutions, the need

of individual credit today is at its highest (Brown, Taylor, & Wheatley Price, 2005) and

"lending boom appears to be particularly strong in the segment of loans to households," as argued by Backé and Wójcik (2008) Unlike the wholesale banking which trades with large and typically rated borrowers, the retail banking deals with small loan sizes and a huge number of personal clients who, in most cases, have no credit ratings at all Since each loan is relatively not large in amount, it is implied that the risk of default on any personal loan is quite minimal Traditionally, a loan approval is based on the credit officer's judgment or

Trang 12

experience from previous decisions However, it is costly and time-consuming for each loan profile to be examined separately In fact, no loss on any separate retail loan can put a bank close to insolvency Hence, unit cost for appraising the default risk of a retail loan may be larger than the reward in terms of loss prevention and it might not be worthwhile determining the risk on the basis of an individual loan As a result, to measure the level of credit risk as a whole for such individual loan segmentation, banks use loan default predicting models or credit scoring models (CSMs) whose goal is to forecast bad outcomes and make sure that good loans are not falsely rejected and bad loans are not wrongly accepted either

According to Yang, Nie, and Zhang (2009), the credit rating of banks has three important milestones which are: (1) Expert system, (2) Credit scoring, and (3) Probability of default model So far there have been many good remarks associated with those approaches

as follows Brill (1998) argues that building and refining a CSM can have certain benefits such as cost saving in credit assessment, faster credit analysis and improvement in cash flow and collections Chen and Huang (2003) account that with considerable loan portfolios, just a slight enhancement in credit scoring authenticity can lower the lenders' risk and translate significantly into later savings Fishelson-Holstine (2004) proves that CSMs are devised to accommodate the need of increasing loan volume, mitigating credit risks and treating customer impartially That is the reason why such tools are beneficial to both institutional creditors and borrowers In the same way, Allen et al (2004) reckon that banks applying CSMs tend to be more efficient at lower costs Dinh and Kleimeier (2007) insist that if a good model is employed with the availability of reliable data, scoring would greatly diminish the risk Noticeably, the Board of Governors of the Federal Reserve System (2007) reports to the Congress that "credit scoring reduces the cost of lending or facilitates more effective risk-based pricing of loans, increased use of credit scoring may expand the range

of applicants to whom lenders are able to make loans profitably" (pp 42-43)

1.2 Problem statement

Banking sector in Vietnam has been growing significantly in the last decade A report

by McKinsey Global Institute (2012) reveals that total bank credit to GDP in nominal local currency has increased sharply from approximately 22% as of 2000 to more than 120% ten years later, equivalent to 33% annual growth which is the highest among neighboring countries: India, China, Indonesia, Malaysia, Thailand and the Philippines The statistical figure reveals that Vietnamese economy totally depends on the banks' sources now just after one decade The rapid expansion in banks' lending will also bring non-performing loans (NPLs)

Trang 13

While the reported level of bad loans appears to be under control, the true volume is likely to

be much higher than what is publicized In reality, NPLs climb up to 10% in May 2012, 4% higher than that in 2011 and equal to 10% of the year's GDP.1 Therefore, the situation is an urge for a stricter standard of bad debt recognition in order to manage credit risk, especially in the context of global financial crisis, Vietnamese economic downturn and intense competition over the past few years

As Vietnam's banking market is maturing, banks have to deal with competition not only from other domestic credit institutions but from well-performing foreign banks as well Despite the fact that retail banking has been growing rapidly over recent years, BIDV has not taken this trend seriously A new chapter opens after its initial public offering (IPO),2 the bank happens to perceive that the wholesale banking activities cannot be enough for growth and prosperity, and thus not guarantee its leading role in Vietnam's banking sector Consequently, recent changes and developments in the field of banking have led the bank to a renewed interest in retail segment Since it is not economical to devote extensive resources to analyze personal credit risk of default on a case-by-case basis, a CSM is of importance instead As BIDV is one of the largest banks in Vietnam with a nationwide network on the one hand and the standard of bad debt classification has to comply with the state regulation on the other hand, the implications in the empirical study at this bank can be generalized for the whole Vietnamese banking system

1.3 Research objectives

This study aims at distinguishing a good risk from a bad risk, therefore, the key purpose is to examine factors that cause default in personal loans and how the proposed models can help to manage credit risk In pursuit of this, the research attempts to focus on the relationship between the loan outcomes and borrower characteristics (e.g housing, employment status and income level) and loan characteristics (e.g interest rate, collateral type and value).3 More specifically, this paper measures the impact of loan size, duration and purposes on the default, and evaluate how one's borrowing history and collateral take effect

be subjected to changes in the macro economy Such variables are income, interest rate and loan duration

Trang 14

1.4 Research questions

To answer the overall aim, the research questions are as follows

Main question: What are the determining factors that impact the debtors' performance

in personal loans?

Sub-questions:

(1) Do loan size and duration positively impact on the default?

(2) Does bad borrowing history positively impact on the default?

(3) Is there any difference in the default between loans for business purposes and business purposes?

non-(4) Finally, is it true that collateral can prevent the default?

1.5 Justification of the study

This study contributes to the existing literature in the following ways First of all, since there is little literature on credit default predictor in Vietnam retail banking market, the thesis can enrich the literature by providing empirical results with a sensitive dataset Second, the data employed is most updated with a reflection on Vietnam macro-economic changes after the 2007 US mortgage crisis, which sparked a worldwide tough economic time Third, a

model without the typically significant loan characteristic, discretionary incomes, is

constructed to provide the bank with possible checking for misinformation or potential fraud Especially, the research takes into account the credit profile of clients to study how those with different borrowing histories incline to behave with their current loans Finally, the constructed models are subject to a wide range of validation testing to insure their reliability and generalization

1.6 Scope of the study

Lending process is a relatively straightforward series of actions involving two principle parties These actions go from the initial loan application to the successful repayment of the loan or its default In lending activities, Summers and Wilson (2000, p 38) summarize six functional responsibilities basically connected with granting credit to clients They involve: (1) assessment of clients' credit risk, (2) making credit granting decision with regard to terms and limits, (3) collecting due receivables (debts) and acting against defaulters, (4) monitoring clients' behavior and compiling management information, (5) bearing risk of default, and (6) financing the investment in receivables

Trang 15

This research investigates the fourth of the six lending responsibilities Hence, the writing concentrates on statistical data collection of customer behaviors and tries to discover managerial outcome from analyzing the data which is collected in the 2008-2011 period The scope of the study is defined to include consumer credit (covering automobiles, recreational vehicles, education, credit cards, etc.), home improvement loans or mortgages for purchasing

a house Individual loans with business purposes are also included in the research.4

1.7 Organization of the study

The remaining of the essay is organized in the following way The second chapter begins by laying out the history of default predicting and then looks at the theoretical and empirical dimensions of the research Chapter 3 describes the data, research methodology and criteria for assessing the quality of the econometric models Chapter 4 analyses the data, reports the findings and evaluates the results The last chapter concludes the paper with policy implications and discusses the limitations and directions for further studies

4 CSM's role is important in the Basel II implementation for retail portfolio Small loans for business purposes with amounts less than $1,000,000 in the U.S and managed on a pooled basis are categorized as retail credits

Trang 16

Chapter 2 LITERATURE REVIEW

The purpose of this chapter is to review theoretical and empirical literature on default predictors in retail banking The chapter is divided into six sub-sections The first two parts discuss the history as well as concepts of credit scoring The next two parts cover the economic theories and empirical studies on the subject The next one summarizes empirical literature; especially it recalls the significance of the many variables that are used to predict credit performance The final part provides summary of determinants of credit default in the empirical literature

2.1 History of credit scoring

Most academic writers cite a report by Durand (1941) released through the National Bureau of Economic Research (NBER) as the earliest statistical approach to the issue of credit application selection In the 1940s the so-called credit scorecard systems were implemented for some mail order firms and financial credit suppliers (Lawrence & Solomon, 2002) However, it was not until 1958 that the initial commercial rating systems were created by Bill Fair and Earl Isaac for American Investment, a finance corporation located in St Louis, Missouri (Fishelson-Holstine, 2004) Their first projects were proved to be successful as those scoring systems helped to reduce default cases up to 20-30 percent while sustaining similar volumes; they could also be used to maximize lending amount by 20-30 percent with the same delinquent level

The exercise of CSM became more widespread in the early 1960s when the business

of credit card matured and the need for decision-making speed became necessary (Anderson, 2007) After that, credit scoring processing was applied to other customer classes At this point, Myers and Forgy (1963) made a comparison between discriminant analysis and regression in rating applications Later, a bankruptcy prediction model was described by Beaver (1966) These works focused on two aspects: failure prediction and credit quality classification According to Allen et al (2004), the most used traditional CSM was the 1968 multiple discriminant analysis for corporations by Altman More than one decade later (1980), the same author introduced the basis lending process for banks as an integrated system and made an analysis on how to set the criteria for commercial loan assessment

Trang 17

The 1960s, according to Ritter (2012), also witnessed a dramatically increasing attention to a matter of equality, specifically in the context of the American Civil Rights Movement In those days, single women, divorcees, or widows could not get a mortgage loan without providing a male relative as a co-signer In addition, minority applicants, particularly African-American borrowers, and elderly customers experienced limited options in gaining credit Under the circumstances, the Equal Credit Opportunity Act (ECOA) was implemented The 1974 ECOA makes it illegal if creditors discriminate against their applicants on the grounds of "race, color, religion, national origin, sex, marital status, or age (provided the applicant has the capacity to contract)" (ECOA, 15 U.S.C 1691 et seq., Part 202, Section 202.1) Since then, consumer credit decisions have considerably relied more on credit history provided by credit reporting agencies that have used verifiable data to build up their computerized files

Thanks to the surge of automated statistical credit rating and mortgage scoring as an approach to approving and underwriting loans, housing finance evolved in the 1990s Automated underwriting, which was previously employed in credit card business and automobile loans, has turned into the most important mortgage underwriting approach especially since 1995 (Straka, 2000) Besides the CSM's wider and wider use in mortgage origination, Mester (1997) demonstrated that as many as 97% of banks have employed CSMs

to evaluate credit card applications while 70% of them have taken on such tools for their small business financing Since Altman's traditional CSM, there have been other methodologies for measurement of credit risk In accordance with Allen et al (2004), there are four approaches in dealing with multivariate CSMs: (1) linear probability model, (2) logistic model, (3) probit model, and (4) multiple discriminant analysis model All of the four models can be successful in picking out variables that have predictive power to differentiate defaulters from non-defaulters

2.2 Concepts of credit scoring

Credit decision is a prospective judgement, i.e the important problem is not how borrowers behaved in the past, but how they will act in the future According to Fishelson-Holstine (2004), past behaviors and current position are useful signals for one's behavior pattern and thus indicators of likely future behaviors Credit decision is based on the premise that the way people behave in the future is the same or almost the same to that in the past This premise also hints that customers' willingness to repay their debts does not change over time Figure 2.1 presents the scoring process framework to assess the clients' ability to repay

Trang 18

Figure 2.1 Process of credit scoring

Past debtors C urrent customers

predicting X11, X12, X13, Y1 = ?

X21, X22, X23, Y2 = ?

Xn1, Xn2, Xn3, Yn = ?

Source: Liu Yang (2001)

In traditional credit granting, to Anderson (2007), potential borrowers are assessed

with the Five Cs: character (the applicant's reputation), capacity (to repay the loan), capital (as a backup), collateral (to secure the loan) and conditions (external factors as interest rate

and loan amount) These assessments lay the foundation on the creditor's own experience with historical information and a view of the applicants' prospects taken into account Anderson explains that this approach is suitable for communities where the lenders and borrowers have personally known each other However, it is not efficient in modern time of extended branch networks and customer mobility Regarding credit decision based on such judgmental system, even at its best, is still inaccurate Fishelson-Holstine (2004) reckons the creditor's rule-based systems as a series of hurdles Each applicant has to fulfill all the criteria in order to be

approved whereas each factor is regarded in isolation Therefore, it is unlikely for strengths to offset weaknesses

On the contrary, a CSM performs a rigorous analysis of available data and is based on

a thorough knowledge about the relationship between historical behaviors and performance in the future The result of scoring is a single score which represents a balanced snapshot of a particular applicant's risk A person may have weaknesses in a certain area but gain strengths

in another The link between all of the factors is examined and every factor is given weight in its relationship with others Anderson comments the credit scoring use has shifted credit business from relationship lending to transactional lending, from secured lending to unsecured lending The fact that experience has been replaced by data has hindered human judgement from playing more of a role Though the five Cs are still applied, by extracting the maximum value from available information, credit scoring can capture much of them This does not mean that human judgement and collateral totally disappear; they are there, yet with a less stressed role

Trang 19

Under scoring approach, borrowers are ranked according to the probability that they will be late in their payments or default Creditors typically set up a cut-off score for a certain acceptable level of risk For instance, a creditor might establish a cut-off score for his or her portfolio so that the minimum odds of repayment are 25 to 1 The applicants whose scores are below the cut-off will be rejected while those who are rated above it will be accepted The level of justified risk can vary between lenders, portfolios and may change from time to time The cut-off value can also be a good standard of pricing loans in accordance with the delinquent risk

Relevant to CSM, credit rating method is also addressed to risk management

However, this approach considers a wider range of factors to classify corporate and institutional credit risks into grades The following table should make clear about the blurring boundary between these two approaches

Table 2.1 Credit scoring vs Credit rating

Source: Liu Yang (2001)

As for CSM factors, Anderson (2007) reveals that although there are as many as hundreds of variables having some predictive value in a typical scoring development, probably eight to twelve elements will be identified in the final model If every variable has only three prospective values on the average, the probable combinations can run up to tens of thousands It would hardly be possible for a judgemental decision procedure to weight and

Credit scoring Credit rating Producers credit granting institution,

vendors of scoring model

Internal rating system Public rating agency

Objects of

evaluation

consumer credit and small business credit within institutions

business loans and institutional loans within institutions

worldwide companies, financial instruments and Sovereigns

number of grades (within common frame across major rating agencies) Users institution itself institution itself, supervisor lenders, investors,

regulators Main

Trang 20

assess that much complexity Furthermore, a portion of the available data seems to correlate

or have no reliable connection with payment behavior in the future By selecting the proper factors and analyzing the correlation, CSM users would benefit by minimizing the influence

of poor quality data

The main feature that distinguishes most CSMs is whether they depend on exhaustive analytical statistics of real credit experience to ascertain which factors are to be put into consideration as well as the weight that each one should be assigned in making final credit decision By employing a consistent dataset, a large number of staff in an organization can reach the same credit decision Nonetheless, judgemental approaches and CSMs may not always produce same decisions with respect to same applicants Under the circumstances, it is suggested that some further examinations are needed instead of approving or rejecting them directly (Dinh & Kleimeier, 2007) Last but not least, a CSM is not a panacea but it should be treated as a refinement tool in the process for credit granting in order to form the ultimate combination of both human best practices and statistical applications (Van Gool, Verbeke, Sercu, & Baesens, 2012)

2.3 Reviews of economic theories

The purpose of borrowing money can be classified into two categories: (1) investment (business purpose), and (2) consumption (non-business purpose) Relevant to the latter,

Keynes (1936) finds out the consumption-income relation in his well-known book The General Theory of Employment, Interest and Money In Keynesian theory, what to be

concerned with is not only a personal consumer's consumption but the total consumption spent by all individuals as well The rise in one's income will raise his or her expenditure level and as income increases, consumption also increases but not by as much as the increase in income From this point, the idea of marginal propensity to consume (MPC) is founded Keynes' consumption function is also called the absolute-income hypothesis which postulates that spending habits of people do not change orare stable over time when consumer income changes Keynesian theory reveals the factors having impact on consumption are as follows: (1) individual's real income, (2) past savings, and (3) interest rate However, one of non-income determinants that influences MPC is consumer credit With credit availability, consumers spend their income more and otherwise, if credit is costly or not easily affordable, consumer income is spent less It is noted that MPC is much higher for one who is nearer to his or her credit limit (Gross & Souleles, 2002)

Trang 21

Concerning consumption habits, Duesenberry (1949) reveals that people usually observe behavior and consumption patterns from their peers to decide on their consumption and savings If a consumer feels an urge for conspicuous consumption as a whole, a higher

standard of consuming action may be displayed Naming this phenomenon demonstration effect, he argues that it promotes unhappiness with present consumption levels, which impacts

on savings rates and thus affects macroeconomic growth Close to Duesenberry, Nurkse (1957) adds that the society's exposure to new goods or ways of living brings about unhappiness with previously accepted consumption practices The consumers' preference functions are not independent but interdependent as "people come into contact with superior goods or superior patterns of consumption, with new articles or new ways of meeting old wants." Hence, he concludes that these people "are apt to feel after a while a certain restlessness and dissatisfaction Their knowledge is extended, their imagination stimulated; new desires are aroused, the propensity to consume is shifted upward" (p.59) As a result, credit debt will be involved in consumption if there is an income shortage and consumption is

to happen Jappelli, Pischke, and Souleles (1998) prove that households with credit cards are better equipped to consume than those without bankcards The reality is that Americans have grown comfortable with consumer credit as a way to smooth income over the last few decades (Durkin, 2000)

However, credit market is characterized by asymmetric information: borrowers know better their ability and willingness for loan repayment than lenders do If this is a highly prevalent situation, it is less incentive for creditors to lend Thus, credit will be rationed in markets with information asymmetry (Rothschild & Stiglitz, 1976; Stiglitz & Weiss, 1981) This is one of the features that differentiates credit markets from other markets.5 Credit rationing is said to happen when people would like to borrow more at a quoted interest rate but creditors prevent them from doing so.6 In their studies, those authors analyze that banks may reject some debtors because of the two components of asymmetric information The first component – adverse selection – happens when good borrowers might not take the loan as they find interest rate too high while risky borrowers can easily accept to take the loan at all cost The high interest rate will raise the average risk of borrowers, possibly decreasing the lenders' profits The second component – moral hazard – occurs after the loan has been given Borrowers might change their behaviors, i.e to invest in riskier projects because they are not dealing with their own funds If such risky investment does not pay off, lenders cannot

5 In most other markets, such a situation will simply lead to a price increase, which makes demand equate supply

Trang 22

recover the funds As rational lenders cannot control all the behaviors of the borrowers directly, theywill formulate the contract terms in a manner that induces the borrowers totake actions for the sake of the lenders, as well as to appeal low-risk borrowers

Despite interest rate acting as a screening device for distinguishing good risks from bad risks, it is not the only contract term that is important According to Stiglitz and Weiss (1981), the collateral amount does affect the borrower in terms of both behavior and distribution Theoretical literature shows three main reasons explaining why collateral requirements are commonly quoted in loan contracts First of all, collateral is required because there is no certainty that a borrower's future income will be as expected If an enforceable claim against one's future wealth and income could be issued then there would be

no motivation for his or her bank to demand some kinds of assets as a guarantee (Plaut, 1985) Therefore, the bank will resort to collateral to reduce loan loss in case of default This reason

is intuitive and independent of asymmetric information between the creditor and the debtor Secondly, collateral may reduce adverse selection problem as the borrower owns better information than the bank before its lending decision Such private information can result in credit rationing since the bank is unable to set the loan price according to its customer's quality (Stiglitz & Weiss, 1981) Acting as a signaling device, collateral can convey borrower's valuable information to the lender, who will then screen debtors by offering the alternative between a low-price secured loan and a high-price unsecured one A high-quality borrower tends to choose collateral to signal his or her quality and thus secures a lower interest rate for the loan (Bester, 1985) Finally, collateral can help to reduce moral hazard problems once the money has been disbursed, by deterring the borrower's motivations to invest the fund in riskier projects or to make less effort to bring the financed project to success (Boot, Thakor, & Udell, 1991) Using collateral, indeed, the bank is able to align its debtor's interest with its own since the debtor will endure a greater loss if his default should happen

2.4 Reviews of empirical studies

2.4.1 Default predictors in markets for credit cards and instant loans

Many of the previous empirical studies on retail credit, especially consumer credit market, focus on most traditional loans which differ from loans for credit cards and instant loans in the following key respects With traditional loans, loan amounts are predetermined and payment schedules are fixed; while in credit card and instant loan market, the actual loan amounts are at the clientele's discretion after a fixed credit line is made available Debt repayment for traditional loans is often paid in installments, whereas a credit card holder

Trang 23

flexibly pays back his debt in terms of a required monthly repayment as a certain percentage

of the outstanding balance Finally, contrary to many traditional loans, credit card and instant loan customers may be granted with a smaller amount of funds without any strict requirements on collateral

Jacobson and Roszbach (2003) do research on credit-scoring in which they recommend a probit approach to evaluate portfolio credit risk In their study, the data is collected from not only approved loans but also rejected applicants as well In this way, they justify that their models do not suffer from the so-called sample-selection bias which commonly exists in the literature The dataset consists of 13,338 loan applicants from a main lending institution in Sweden from September 1994 to August 1995 At first, 57 variables are observed but only 16 are cultivated The reason why most of the variables are ignored is that those variables have little explanatory power or they are better explained by other variables The empirical evidence reveals that age, marital status, change in yearly income and collateral-free credit amount are significantly influential in the default Surprisingly, loan size shows no significant impact on the default and higher income turns out to connect with higher default risk

Agarwal, Chomsisengphet, and Liu (2011) employ another technique to study the influence of individual-social-capital role on individual default and bankruptcy outcomes They use a monthly set of panel data covering at least 170,000 cardholders in the U.S from January 1997 to June 2000 and conduct the study with Cox proportional hazard model With observations of all borrowers' default and bankruptcy filing status, they could find those that affect the default are financial distress factors (e.g riskiness, income, debt, spending and wealth), economic conditions and legal environment as well as socio-demographic characteristics (e.g age, status of marriage and homeownership) After controlling for financial distress factors, economic conditions and legal environment, the results reveal that cardholders are more likely to default if they migrate from the place of birth Regarding age, the risk rises but then falls and groups that have the smallest bankruptcy risk are the youngest (30 years old or younger) and the oldest (60 or older) Another finding is that borrowers who are married and own a house are respectively 17% and 24% less probable to fall into arrears, and 25% and 32% less inclined to file for bankruptcy

Different from the above studies Autio, Wilska, Kaartinen, and Lähteenmaa (2009) do not apply any regression to research the possibility of default but they analyze the demographic behaviors of taken loans They carry out a comprehensive investigation on the use of instant loans in Finland in the context of financial crisis The sample data consisted of

Trang 24

1,610 respondents among 1,951 young adults filling out questionnaires which include demographic profiles as gender, age, income, household structure, occupation and employment status The interviewees are also demanded to reveal which kind of their credit: small loans, student loans, credit cards or mortgages Based on the collected data, the researchers examine the people's attitudes towards borrowing The results uncover that young people of the ages 18 to 23 use small instant loans more than the 24-to-29-year-old ones On the other hand, the latter group has more use of consumer credit as they have higher occupational status and income Gender does not appear to affect the number of loans taken, but income, occupational status and household structure do

2.4.2 Default predictors in markets for automobiles, mortgages and real

property construction

As automobiles7 are the most common consumption goods that Americans often purchase on credit, Agarwal, Ambrose, and Chomsisengphet (2008) show their interest in investigating the performance of automobile loans in terms of default risk and prepayment risk.8 The maturities are four and five years and the loan performance is observed from January 1998 to March 2003 in a large financial institution in the Northeastern U.S They conduct a competing-risk model with a dataset of 20,466 direct personal loans9 for purchases

of new and used automobiles Their main findings are as follows It is more likely to default

on a loan for a used car whereas there is higher likelihood of prepayment on a loan for a new car Surprisingly, a customer with lower credit scores has lower probability of default but higher likelihood of prepayment As expected, a rise in the loan-to-value ratio raises the probability of default and decreases the likelihood of prepayment Income has a positive impact on the prepayment while unemployment has a negative effect on the loan performance Paradoxically, a decrease in the basic rate will increase both the probability of default and prepayment And most interestingly, borrowers for most luxurious cars incline to repay the loans in advance while those who borrow to finance most economical cars tend to perform the loans well

Concerning mortgages, Peter and Peter (2011) use income and other twelve factors with 3,431 Western Australian households to estimate the likelihood of loan default They aim

Trang 25

to form the relationship between home owners' risk default and their characteristics with a logistic regression The statistics imply that on average 93% of the households are punctual The regression results reveal that income is a highly predictive variable, i.e lower income level is one of the main factors that send the mortgagors to default Another notably critical factor is loan-to-value ratio which means the likelihood of default would increase with higher loan-to-value ratio One more finding is that in all probability the default would be higher if the head of the family is less educated, younger and divorced in comparison with other households The age of the head of households is also a good indicator which signals that the performance of mortgage repayments tends to be negatively impacted by younger households Other indicators, for instance, employment status, migrant status and macroeconomic factor10show no significance The developed model is proved to be a useful tool that benefits private lenders, policy makers and government in assessing the default risk and developing the appropriate strategies to minimize them

As there have been many studies on predicting default in the developed countries, Kočenda and Vojtek (2011) carry out a research in a European emerging market They gain access to a dataset of a Czech bank which specializes in granting personal loans of small and medium size in the field of real estate reconstruction and purchase The data contains socio-demographic characteristics as well as other information on 3,043 individuals who got loans between 1999 and 2006 Their models are tested with logistic regression and CART11 analysis after information values of variable categories are calculated The result is that six out of 21 variables can have ability to discriminate between good and bad clients They are also variables that have high information values It is noticeable that both approaches can produce similar efficiency and detect the same financial factors and socio-demographic factors as the key predictors of loan performance Clients with higher amount of own resources reflect a distinctly lower default probability Larger loans are recognized as riskier debts Loans for property renovation tend to expose to more risk that those for real estate purchases Like other studies' finding, high educated customers can be much easier in repaying their debts Clients with longer relationship with the bank are less risky than those with shorter time Also, married clients show signs of more safety to grant loans compared to those with no spouse The two authors also manage to build a well-performed model excluding the most significant factor – client's own resources By this way, they re-confirmed the predictive power of the socio-demographic variables

Trang 26

2.4.3 Default predictors in markets for individual loans

Some authors do not narrow down their experiments to a specific loan purpose like the aforementioned literature but incorporate all purposes into the same model For instance, Özdemir and Boran (2004) employ a logistic binary regression to evaluate the relationship between consumer loans' default risk and their demographic and financial variables They use the data set of 500 customer records mostly involving individual support loans, home loans and car loans provided by an Istanbul private bank from 1999 to 2001 Interestingly, they find

no notable relationship between demographic variables such as sex, age and occupation sector and the default risk The only demographic variable that impacts on the credit performance is residential status Meanwhile, the financial variables such as interest rate and maturity have ability to predict the outcomes The longer duration or the higher interest rate, the more risk for customers not paying back their debts in time Özdemir and Boran conclude financial variables have more significant effects on the customers' performance than demographic variables do The finding suggests that bankers should put appropriate focus on financial variables to minimize default risk

In Vietnam, Vuong, Dao, Nguyen, Tran, and Le (2006) become the first to explore the link between loan performance and borrower characteristics 16 features of customers are taken into consideration; most of them are social-demographical variables The dataset is obtained from Techcombank,12 which covers 1,727 individual customer records including 1,374 good loans and 353 non-performing loans With logistic regression, their model produces a highly predictive accuracy of 99% The authors then offer a classification system

of CSM based on the outcome of performance probability For example, good customers

classified into Group A could be put into three sub-groups such as A1, A2 or A3 In the same way, bad customers in Group B could be categorized into B1, B2 or B3 From 16 social-

demographical variables, education level, work tenure, and reputation show no predictive power while media, housing and income-expenditure difference reveal to be good predictors

of loan performance

Similar to Vuong et al (2006), Dinh and Kleimeier (2007) gain access to a much broader database from one of the state-owned commercial banks in Vietnam The population consists of 56,037 retail loans approved in 1992-2005 period covering a wide variety of loan purposes from business loans, mortgages and home loans, mobile asset financing to general credit (living expenses or consumption without collateral) and credit card facilities They

12 Techcombank stands for Vietnam Technological and Commercial Joint- stock Bank, which was established on September 27th, 1993 and the Head Office is in Hanoi

Trang 27

initially rely on the loan officer's expert knowledge to name 22 pertinent variables After that, the forward stepwise method helps them choose 16 out of the first 22 in the model With logistic regression again, they develop a flexible method that is constructed from the standards of transactional lending, yet relationship lending is also left room for They find that region, residential status, marital status, collateral type and education are not effective predictors The most important variables are time with bank, bank account followed by gender, loan number and duration Their CSM is suggested to be regularly updated to response to economic changes

Another interesting approach is carried out by Musto and Souleles (2006) when they examine consumer credit in a portfolio view to analyze the cross-sectional credit distribution and the effect of credit scores Unlike most of other empirical studies on default, these two authors apply asset-pricing theory (CAPM beta) to consumer credit to measure the covariance between individuals' default risk and aggregate default rates Their argument is that lenders' caring about the default risk is not sufficient, but the covariance risk13 or the default beta should be also taken into account In their study, Musto and Souleles access a rich and unique source of credit data from Experian, one of the leading U.S credit bureaus The panel dataset contains 100,000 random samples of monthly customers between March 1997 and March

2003 There are different categories of granted loans, such as credit cards, automobile loans, mortgages, etc The major result reveals that there is significant heterogeneity among customers in the default beta High covariance-risk consumers tend to get low credit scores, i.e high default likelihood There is a positive correlation between the sum of money received

by borrowers and their credit scores and a negative correlation between the fund lent to them and the covariance risk Those who expose to high covariance risk include the young, the single, home renters, lower-income consumers, and residents in regions with lower health-insurance coverage but a higher divorce rate

2.5 Chapter summary

2.5.1 Empirical literature summary

The following table summarizes and compares the common variables used in previous empirical literature to predict credit default The table may not express exactly the same variables employed by the authors but it means to review the most significant ones The

variables are divided into two groups: (1) Borrower characteristics and (2) Loan characteristics It is noteworthy that dividing and naming such characteristics can vary

13 See Lusardi's (2006) comment on how to calculate the covariance risk

Trang 28

between authors For instance, Özdemir and Boran (2004) name Demographic characteristics

vs Financial characteristics, Kočenda and Vojtek (2011) sort Socio-demographic variables from Bank-client relationship variables, Van Gool et al (2012) use Borrower characteristics and Loan characteristics On the contrary, some authors do not group the variables at all

(Jacobson & Roszbach, 2003; Vuong et al., 2006; Dinh & Kleimeier, 2007) In this study, we base on Van Gool et al (2012) to classify the variables into borrower and loan characteristics

Table 2.2 Summary of variables

Borrower characteristics

Jacobson &

Roszbach (2003)

Özdemir

& Boran (2004)

Vuong et al., (2006)

Dinh & Kleimeier (2007)

Time with bank/Length of relationship ** *** Years of employment/Time in present job N.S N.S

Logit regression

Logit regression

Sample size

13,338 loan applicants in Sweden

500 consumer loans in Istanbul

1,727 retail loans from Techcombank, Vietnam

56,037 retail loans in Vietnam

Trang 29

Table 2.2 Summary of variables (continued)

Borrower characteristics

Agarwal, Ambrose &

Chomsisengphet (2008)

Autio et al., (2009)

Kočenda &

Vojtek (2011)

Peter

& Peter (2011)

Agarwal, Chomsisengphet

Old loans/Number of other loans

Own resources / Savings account ***

Time at present address

Time with bank/Length of relationship *** *

Years of employment/Time in present job *

Logit regression, CART

Logit regression

Cox proportional hazard model

Sample size

20,466 direct auto loans in Northeastern U.S

1,610 young adults in Finland

3,043 real estate loans

in Czech

1,303 homebuyers

in Western Australia

170,793 cardholders

in U.S

Notes: √ means variable to be chosen;

N.S means not significant;

*, ** and *** mean significance level at 10%, 5% and 1% respectively;

Trang 30

2.5.2 Problems and limitations of previous studies

In previous empirical literature, there is a wide range of characteristics to be examined which makes tens of variables used in the estimation One major criticism of this problem is that the model complexity will deter a sound decision and lengthy questionnaires will surely

be unattractive to loan applicants Instead of using too many variables, eight to twelve predictors is enough for a reliable model (Anderson, 2007) In addition, many writers rely heavily on a customer's age, gender or marital status while there is apparent evidence that discrimination on the grounds of such characteristics used to be appropriate so long ago, it may not be acceptable nowadays

Most of the studies have employed data from the U.S or developed countries Empirical evidence on credit scoring for developing countries in general is very limited Therefore, there is not much evidence for Vietnam, except the work of Vuong et al (2006) and particularly Dinh and Kleimeier (2007) Notwithstanding, their analyses neither take account of interest rate nor examine loan amount, both of which have been proved to be the major characteristics in credit rationing and personal credit market (Rothschild & Stiglitz, 1976; Stiglitz & Weiss, 1981)

Last but not least, most of the previous studies employ categorical data to deal with econometric models Such technique typically requires that continuous variables be categorized and categories be presented by dummy-coded variables This may be explained

by the availability of data collected A serious weakness with this approach, however, is that a large number of categories and variables can result in an outcome model with too many degrees of freedom, a factor that can "lead to serious over-fitting" (Kočenda & Vojtek, 2011,

p 8) To overcome such drawback as well as to take advantage of the rich information from continuous variables, we suggest specifications in which continuous variables are not categorized This technique is also consistent with Hand and Henley (1997, p 527) who ascertain that "using continuous data models is becoming more common."

Trang 31

loan ratio

Figure 2.2 Conceptual framework

The above diagram visualizes the default predictors in our study with their proposed

relationships We also take account of other variables such as Age, Gender, Marital status, Housing and Collateral type However, as those variables are not part of our final CSMs, they

are not mentioned here in the framework For each variable's expectation and outcome, they are fully observed in the next chapters

Trang 32

Chapter 3 DATA AND RESEARCH METHODOLOGY

This chapter aims to discuss and justify the methods of selecting variables, building CSMs and estimation strategies We first begin with data collection and a description of the relevant variables used in the paper Then, we present methodologies for credit scoring and the most preferable technique Finally, a visualization of the estimation strategies will conclude the chapter

3.1 Data collection

The data sample is built up from 32 branches14 of BIDV from the North to the South with 1,810 retail borrowers who got the disbursement of funds and reimbursed over four years from January 2008 to December 2011 At the time when the observation ends in June 2012, some of the loans are paid off, but the majority of them are still outstanding data management and analysis is performed with IBM SPSS Statistics 20.0.0 (2011) and Stata 12.0 (2011)

In terms of loan purposes, our data consists of two groups: (1) consumer loans and (2) business loans The first group includes mortgages and loans for home construction or maintenance, automobile loans, and general credit (unsecured loans for education, living expenses or consumption) Loans on credit cards are not taken into account because the bank has not widely offered such sort of loan until recently and only best customers can manage to get access to this service The second group, in line with Dinh and Kleimeier (2007), is identified with loans that are used for financing small and private businesses Due to the fact that these borrowers lack reliable up-to-date financial records and there is often a vague boundary between the private and business property of an entrepreneur, business loans can only be evaluated on the basis of the entrepreneur's personal information

It is worth noticing that the analyses in this study are decided on the basis of individual customers whose loan applications were approved under the bank's standard loan procedure The rejected applicants are not included in the sample as the bank has not yet built

up a detailed profile of them This may result in the so-called sample selection bias 15 and involve a "reject inference" process.16 Potential bias, nevertheless, due to the reject inference problem will not pose any critical obstacle since our top priority is picking out the main

14 The bank has representatives in all 63 provinces and cities of Vietnam According to BIDV's Dispatch 124/CV-NHBL2 dated 29/03/2012 on providing information for the development of credit scoring for retail customers, 32 branches are selected for the survey Table A.1 in the Appendix displays the list of chosen branches

15 Nonetheless, it is not a rare in the literature (see Greene (1998) for a sample selection problem)

16 This means the attempting process of inferring true creditworthiness status from the rejected applicants

Trang 33

default drivers For this reason, other potential customers are assumed to share the same characteristics with those in the sample This assumption is on the basic of Banasik, Crook, and Thomas' (2003) finding that there is only minimal difference between the two groups of customers: the rejected and the approved Furthermore, on analyzing reject inference Hand and Henley (1993) conclude that such a reliable thing is impossible Crook and Banasik (2004, p 873) reaffirm this finding by stating that reject inference depends on the rejection rate and "where the rejection rate is not so large, that scope appears to be very small indeed."

3.2 Variables measurements

Our empirical literature has drawn from studies in both developed and developing countries The vast literature reveals the ability that the independent variables may produce varying outcomes depends on nationalities This study mainly adapts to the variable choice of Dinh and Kleimeier (2007) with an updated dataset and some modification of explanatory variables For example, there are several variables which are proved to have significant impacts on the default in previous researches but not treated as predictive variables Such are interest rate, loan size and loan-to-value ratio which will be investigated in our research We also take account of the borrower's credit profile which has almost been neglected in the literature

3.2.1 Response variable

The dependent or response variable is a dichotomous variable which reflects either default or non-default outcomes of the loan The default definition follows the standard of the Bank for International Settlements (2006, p 100) that an obligor is in default if he or she is

"past due more than 90 days on any material credit obligation" to the lender This definition is also applied to Vietnam's current regulations on classification of debts.17 In line with BIS risk

definitions, 22% of the sample loans are considered bad while the remains 78%, good The

bad-loan ratio is relatively high because of the strict definition of default the bank employs and the fact that the survey requires each branch to report both performing loans and non-performing loans to the head office

17 See Clause 3, Article 1, Decision No 18/2007/QD-NHNN dated April 25, 2007 by the Governor of the State Bank of Vietnam on amendment of and addition to a number of articles of the Regulations on classification of debts, and establishment and use of reserves to deal with credit risks in banking operations by credit institutions, issued with Decision 493/2005/QD-NHNN of the State Bank dated April 22, 2005

Trang 34

3.2.2 Explanatory variables

The independent or explanatory variables can be divided into two categories: (1)

Borrower characteristics and (2) Loan characteristics Using expert knowledge of the loan

officer (Hand & Henley, 1997),18 18 characteristics are shortlisted for further investigation

As mentioned, there is no consistent way of classifying explanatory variables and some authors even do no grouping at all, the classification in our study is just for the sake of convenience

3.2.2.1 Borrower characteristics

Age represents the age of an applicant in years and can be described as a continuous or

categorical variable It is often assumed that older customers are more risk averse; thus, they will be more likely to repay their loans well This assumption can be empirically confirmed

by some researchers (Jacobson & Roszbach, 2003; Vuong et al., 2006; Agarwal et al., 2008; Peter & Peter, 2011) However, it does not always hold (Özdemir & Boran, 2004; Dinh & Kleimeier, 2007; Kočenda & Vojtek, 2011) Similarly, we do not consider an applicant's age

as a true default predictor

Gender, besides age, is one of the borrower characteristics that is often used to

distinguish the predictive power between male and female applicants Although there is evidence that women are less likely to miss their loan repayments (Dinh & Kleimeier, 2007), there are proofs showing that gender is not a significant variable (Jacobson & Roszbach, 2003; Özdemir & Boran, 2004; Kočenda & Vojtek, 2011) In this study, it is expected that gender shows no predictive power and thus should not be included in the CSM.19

Marital status is to investigate whether different status of marriage can predict default

as marital status can be seen as a sign of reliability, responsibility or maturity of the borrowers This is a common variable in the literature Agarwal et al (2011) reveal that married borrowers are less likely to default on credit card loans and file for bankruptcy by 24% and 32%, respectively However, Özdemir and Boran (2004) and Dinh and Kleimeier (2007) do not share these findings when their studies unveil that marital status has no effect

on the loan outcome In our study, marital status is also expected not to impact on the default

18 Hand and Henley (1997) recommend three approaches in selecting variables: (1) expert knowledge; (2) stepwise statistical procedure; and (3) information value (IV) In our study, we employ the first and the third approach The first is applied to identify the initial set of explanatory variables The third will then be used to select the most predictive variables from the initial set to include in the econometric models

19 Note that for this as well as other categorical predictors, we use dummy variables to code the parameters

Trang 35

Education measures level that the bank's customers are educated This variable can be

ranked in ascending order: non-high school graduate, high school graduate and tertiary education or above There is a convincing evidence of a link between high educated people and low probability of default (Kočenda & Vojtek, 2011; Peter & Peter, 2011) In line with those scholars, we anticipate that borrowers with better education level have lower likelihood

of default because they can have more stable and higher-paid careers

Housing describes whether the applicants rent, live with their parents or own their

home In accord with Agarwal et al (2011), residential status can indicate a borrower's financial wealth, especially when he or she owns a house They find that a homeowner can be 24% less inclined to default and 32% less probable to file for bankruptcy Indeed, housing has long been known as a significant variable (Özdemir & Boran, 2004; Vuong et al., 2006) We also believe that default risk will be lower for homeowners compared to those who are not

Number of dependents indicates how many persons, mostly the number of children, a

loan applicant has to support and can be expressed in terms of a continuous or categorical variable The increasing number of dependents is apparently putting a strain on the applicant's finances Like Vuong et al (2006) and Dinh and Kleimeier (2007), we expect the number of dependents will have a positive impact on the default

Occupation is another commonly incorporated variable as it is highly correlated with

the applicant's income for paying the debt There are many different kinds of job and our bank focuses on these five categories: the unemployed or retired, office staff, skilled worker or manager, self-employed or entrepreneur, and other occupation In Vietnam as well as in many countries, the type of occupation might be a good proxy for income level and stability Our expectation is that an applicant with more skilled-job status will be less likely to default

Work experience is a measurement of how many full years that an applicant has been

in his current occupation and can be a continuous or categorical variable This variable matters as it can reflect the borrower's satisfaction with his present job With higher job satisfaction, the applicant will find their employment more stable and thus their ability to pay back the debt will be more secure Kočenda and Vojtek (2011) empirically prove that borrowers who have at least 14 years of employment can have higher chance of repaying well

by 60% We also expect the applicant's work experience to negatively impact on the default

Trang 36

Discretionary incomes or Repayment sources 20 present an applicant's monthly incomes left over after living expenditures in million VND This is one of the most used predictors in CSM and many authors come to the same conclusion that income has significant predictive power and those with high income and good wealth are less likely to miss their debts (ECOA, 1974; Vuong et al., 2006; Agarwal et al., 2011; Peter & Peter, 2011) Close to their findings, we expect discretionary incomes to impact negatively on the default

Current account indicates whether an applicant holds a checking account According

to Dinh and Kleimeier (2007), only about 6% of the Vietnamese has bank accounts in 2006, which means Vietnamese are still not familiar with bank transactions Bank account can be a rare in most CSM studies in most developed countries because it is the norm; however, it should play an important role in predicting default between individual borrowers in developing countries such as Vietnam Like these authors, we suppose applicants who have already had bank accounts are less likely to miss their repayments than those who have not

Length of relationship represents the length of relationship with the bank in years It

is assumed that, in relationship lending, the longer an applicant stays with the bank, the lower risk of default becomes In fact, this variable is proved to have significant predictive power (Dinh & Kleimeier, 2007; Agarwal et al., 2008; Kočenda & Vojtek, 2011) We also expect this finding to be true in this study

CIC, which is something of a rarity in literature, presents a client's borrowing history

reported by the credit reference agency In Vietnam, credit institutions are required to periodically store accurate and up-to-date information on its loan portfolios as well as each loan's performance with the SBV's Credit Information Center.21 If a debtor falls into arrears, their default will stay on a credit report for five years in the database of the only Credit Information Center.22 This will of course affect their credit standing in the future, for example, getting a new loan or a credit card facility without good collateral is difficult or even impossible In this study, borrowers are categorized into three groups based on their credit reports: no information available, bad debt history, and good performance or no outstanding loan We expect those who have a history of bad debt will be the most likely to default

20 When referring to the sources of loan repayment, it should be better to take both incomes and expenditures into account However, another proxy which is a composite of the two is used for fear of strong correlation between them

21 Based on Decision No 51/2007/QĐ-NHNN dated December 31, 2007 by the Governor of the SBV on the issuance of the Regulation on credit information activities This decision will be replaced by Circular No 03/2013/TT-NHNN dated January 28, 2013 by the Governor on the credit information activities of the SBV when it comes into effect in 2014

22 Home page: http://www.cic.org.vn/cicportal/index.php

Trang 37

3.2.2.2 Loan characteristics

Loan amount is the size of credit granted to an applicant An applicant may have

applied for a larger amount but they may gain access to a smaller limit due to the bank's evaluation This variable is expressed in million VND and can be categorized or continuous Several empirical studies use loan amount as a predictor; however, the final outcome is ambiguous and therefore no clear expectation can be formed While some authors show that loan amount does not have significant influence on the default (Jacobson & Roszbach, 2003; Özdemir & Boran, 2004), others insist that larger loans are riskier (Vuong et al., 2006; Kočenda & Vojtek, 2011) Similar to the latter, it is supposed that borrowers with large amounts of loan are more likely to fall into arrears

Loan purpose describes the underlying reasons why an applicant needs financing and

it might indicate their financial situation Defining the purpose of a loan often has a direct influence upon whether the loan is approved as well as the terms and conditions that the banks will apply such as loan duration, interest rate and types of collateral As mentioned in Section 3.1, this variable consists of business loans and non-business loans One of the reasons why loan purpose is so critical to the banks is that its identification helps to provide information for determining default risk In Vietnam, loan purpose is of the two required principles in granting credit.23 Dinh and Kleimeier (2007) study the impact of loan purpose on the loan outcome in Vietnam's retail banking market and reveal that business loans appear to default less than non-business loans do In line with them, we expect loans with business purposes to have an adverse effect on the default

Loan duration represents the maturity of loans in years This is a typical feature of the

loan and it is often the result of the negotiation between the bank and its client However, in line with Dinh and Kleimeier (2007), what we measure in Vietnamese situation is the duration

of loan as proposed by the applicant This variable thus reflects the applicant's risk aversion and self-assessment of repayment ability There are convincing evidences of a link between longer maturity and higher default risk (Özdemir & Boran, 2004; Dinh & Kleimeier, 2007)

We also expect the same thing in our CSMs

Interest rate is the price of a loan which is expressed in percentage-per-annum terms

As it is used to make up for the risks associated with future unknowns, it is generally differentiated according to the financing purposes, loan maturity and applicant's creditworthiness Such variable is not the norm in literature because interest rate is the result

23 Article 6, Decision No 1627/2001/QD-NHNN dated December 31, 2001 by the Governor of the SBV on issuing regulations on lending by credit institutions to clients

Trang 38

of the bank's own decision after evaluating the risk of the customer However in Vietnam context, the loan price is not only decided by the bank itself but also by the State Bank in

form of the base interest rate In consequence, there might not be much differentiation

between different kinds of customers Özdemir and Boran (2004) find that higher interest rate results in more risk for loan repayment Similar to their work, we expect interest rate to impact positively on the default risk

Collateral type indicates which kind of collateral that supports a bank loan According

to Dinh and Kleimeier (2007), this variable is the unique one that most industrialized countries do not incorporate into their CSMs However, in Vietnam's retail banking sector, most loans require collateral as a means of reducing loan loss just in case Statistics from their work unveil that while most defaults arise in the collateralized or business loans, default rate

is low in credit cards which almost require no collateral The most preferred collateral is real estate which constitutes up to 98% of business loans In our study, we employ collateral type with three categories: no collateral, mobile asset and real estate Despite the fact that Dinh and Kleimeier (2007) find collateral not a statistically significant variable, we expect unsecured loans are riskier than secured loans

Collateral-to-loan ratio measures the ratio of the collateral value to the loan amount

Intuitively, higher value of this ratio will induce higher incentive of borrowers to repay the debt as they do not want to lose their collateral It is more usual to use this variable in the opposite way, i.e loan-to-collateral ratio (Peter & Peter, 2011), but this does not work in case

there are loans without collateral Besides, some authors use the variable collateral value (Vuong et al., 2006; Dinh & Kleimeier, 2007); however, there may exist a strong correlation

with the loan amount as larger loans require higher value of collateral The aforementioned empirical literature gives evidence that higher collateral value or smaller loan-to-collateral ratio can negatively impact on the default In this study, we expect the variable to do likewise

3.3 Research methodology

3.3.1 Descriptive analysis

The section is used to depict a picture of Vietnam's personal bank loans from the beginning of 2008 to the end of 2011 In this section, the analysis consists of calculation and comparison of (1) characteristics initially considered for the CSM; (2) loan purposes; and (3) differences in variables between performing and non-performing loans The descriptive statistics analysis also provides some insight of various factors relevant to the loan outcome,

Trang 39

which can be beneficial to analyses and estimations of the econometric model discussed in the following section.

3.3.2 Econometric model

3.3.2.1 Methodologies for CSM

According to Thomas (2000), three most approaches for predicting methods are: (1) judgmental, (2) statistical, and (3) non-statistical, non-judgmental However, the matter is how to employ the best fit approach Hand & Henley (1997) argue that there is no best approach for all cases The best fit approach depends on the structure of data, the characteristics and to which extent the measured variables can be put into classification The chosen statistical approach in our writing bases on past data and consists of methodologies such as discriminant analysis, linear regression and logistic regression or probit regression (Allen et al., 2004)

Discriminant analysis is an efficient procedure However, it works with the assumption that the data is normally distributed and equally dispersed between the defaults and non-defaults (Ohlson, 1980; Altman & Sabato, 2007) As there are quite a few dummy variables in our study, which has also acquired a disproportional data sample,24 discriminant analysis cannot be adopted because of the violated assumptions of normality as well as equality in dispersion

Linear regression is a common regression method for most general purposes Nevertheless, as it uses least square regression, it exhibits worse generalization behavior for credit scoring than logistic regression Moreover, the former is less robust to deviations from Gaussian distribution than the latter (Van Gestel et al., 2005) Finally, the main defect of linear regression, as Thomas (2000) notices, is that it predicts ranges from [–∞, +∞] while probability estimation cannot make sense outside the [0, 1] range

Logistic and probit regressions make use of maximum likelihood estimation to produce an outcome probability which is a percentage term that can be directly interpretable

or usable to make operational decisions Although logistic and probit regressions usually produce similar outcomes, the logistic technique gives "better results" in predicting default (Altman & Sabato, 2007, p 335) The main difference between the two techniques exists in the latent model behind them: the probit assumes standard normal distribution, whereas the

24 As mentioned in Section 3.2, the sample size includes 1,810 individual loans whose 78% are good loans and the remains, 22%, are non-performing loans

Trang 40

logistic does not (Anderson, 2007) Since many dependent variables in this research are categorical variables which cannot be assumed to follow normal distribution, a logistic regression is the most preferable of all the aforementioned methodologies

3.3.2.2 Logistic regression

Logistic regression, in line with Burns and Burns (2008), overcomes many restrictive assumptions of the linear regression For example, normality, linearity (but linear relationship with the log odds function) and equal variance are not assumed, and error term variance need not be normally distributed either Logistic technique requires the following major assumptions: (1) dichotomoustarget variable; (2) absence of multicollinearity; (3) no outliers

in the data; and (4) sufficient sample size

Mathematically, the logistic equation is expressed as the natural logarithm of the

probability that Y=1, referred to as p, divided by the probability that Y=0, referred to as 1–p

In symbols, the regression equation is defined as:

The meaning of the logistic regression coefficients can also be expressed in terms of probabilities rather than changes in odds Expected probability that Y=1, herein the probability that a borrower would default, can simply be calculated in the next formula which

is another rearrangement of Equation (3.1):

0 1 1 2 2 k k

0 1 1 2 2 k k

+ x x x + x x x

ep

p the probability : larger

e = the base of natural logarithms (roughly 2.72),

the constant of the equation

of default p, higher pro

,the coefficient of t

Ngày đăng: 29/11/2018, 23:53

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN