INTRODUCTION MACHINE LEARNING MODELS TO PREDICT SHAREHOLDER RETURNS IN THE AIRLINE INDUSTRY 1.. Abstract 300 words or less: This research project addresses the need to fill the gap con
Trang 1VIETNAM NATIONAL UNIVERSITY, HANOI
ID: 21070075 Class: BDA2021A
Hanoi, …………2024
Trang 2TEAM LEADER INFORMATION
- Phone no /Email: 21070075@vnu.edu.vn
II Academic Results (from the first year to now)
Academic year Overall score Academic rating
III Other achievements:
Trang 3
TABLE OF CONTENTS
1 Introduction 8
1.1 Background of the Study 8
1.1.1 Context 8
1.1.2 Total Shareholder Return (TSR) 8
1.1.3 Featured metrics 9
1.1.4 Common metrics 10
1.2 Research Problem 11
1.3 Research Objectives and Scope 11
1.4 Research Methods 12
1.5 Research Structure 12
2 Literature Review 12
3 Data & Research Methodology 14
3.1 Overview 14
3.2 Data Collection 15
3.3 Data Analysis 16
3.3.1 Correlation Analysis 16
3.3.2 Data Processing 17
3.3.3 Prediction Model 18
3.3.4 Performance Evaluation Metrics 18
4 Results 19
4.1 Descriptive Analysis 19
4.2 Seasonality Analysis 19
4.3 Machine Learning for Prediction Models 20
4.3.1 Random Forest Regression 20
4.3.2 AdaBoost 23
4.3.3 Long Short Term Memory 25
4.3.4 XGBoost 26
4.3.5 CatBoost 26
5 Conclusion and Recommendations 27
5.1 Conclusion 27
5.2 Recommendations for investors 27
5.3 Limitation of study & recommendations for further research 27
Trang 46 References 29
Trang 5LIST OF TABLES
Table 1 Results from machine learning models for TSR 20
Trang 6LIST OF FIGURES
Figure 1 A part of the final dataset used for building prediction models 16
Figure 2 Correlation heatmap 16
Figure 3 Summary of descriptive statistics of all variables 19
Figure 4 Dickey-Fuller test result 19
Figure 5 R-squared results for the 2 dataset when running the LSTM model 20
Figure 6 Result summary for Random Forrest Regression model 21
Figure 7 Actual versus Predicted Result for Random Forrest Regression model 22
Figure 8 Summary result for AdaBoost model 24
Figure 9 Actual versus Predicted Result for AdaBoost model 25
Trang 7INTRODUCTION MACHINE LEARNING MODELS TO PREDICT SHAREHOLDER RETURNS IN
THE AIRLINE INDUSTRY
1 Project Code
CN.NC.SV.23_32
2 Member List:
3 Advisor(s):
Dr Le Duc Thinh
4 Abstract (300 words or less):
This research project addresses the need to fill the gap concerning the prediction of shareholder returns in the airline industry using machine learning techniques by examining the efficacy of various machine learning models in predicting shareholder returns within the airline industry Additionally, it seeks to identify key financial metrics, particularly financial ratios, that significantly influence stock performance, specifically total shareholder returns (TSR) of airline companies
5 Keywords (3 – 5 words)
Airline industry, Machine Learning models, Total Shareholder Returns
Trang 8SUMMARY REPORT IN STUDENT RESEARCH,
In the contemporary landscape of abundant data and the imperative for data-driven decision-making, machine learning models offer promising avenues for enhancing decision support and predictive analytics The aviation sector, with its complex dynamics and high stakes, stands to benefit significantly from the application of these models
Studies like the 2013 executive compensation research sponsored by the State Board of Administration (SBA) shed light on the alignment of performance metrics with shareholder value However, optimal metric selection varies across industries, necessitating tailored approaches for the aviation sector
Despite the industry's complexity, existing literature has predominantly focused on employing machine learning models for tasks like ticket price prediction and flight delay analysis However, there exists a notable gap in research concerning the prediction of shareholder returns in the airline industry using these advanced modeling techniques
1.1.2 Total Shareholder Return (TSR)
Total Shareholder Return (TSR) is a fundamental metric employed to assess the performance of investments in the airline industry over a specified period TSR encompasses both capital appreciation, represented by changes in stock price, and dividends distributed to shareholders By considering these elements, TSR offers a holistic perspective on the overall return generated from holding a company's stock
In the context of this study, TSR serves as a crucial indicator for evaluating the effectiveness of machine learning models in predicting shareholder returns within the airline industry Through the analysis of TSR data obtained from reputable financial sources such as Yahoo Finance, this research aims to elucidate the predictive capabilities of these models and their implications for investors
TSR plays a pivotal role for investors as it reflects the combined impact of stock price fluctuations and dividend yields A positive TSR indicates value creation for shareholders,
Trang 9signifying an increase in investment returns Conversely, a negative TSR suggests a decline in shareholder value, highlighting potential areas of concern or inefficiencies within the airline industry
Utilizing TSR data sourced from platforms like Yahoo Finance enables researchers to access comprehensive financial information essential for robust analysis By examining trends
in TSR over time, this study seeks to enhance understanding of the factors influencing shareholder returns in the dynamic landscape of the airline industry
1.1.3 Featured metrics
This study focuses on analyzing several key metrics commonly found in the airline industry, retrieved from airlines' quarterly financial reports (Airline Glossary – Southwest Airlines, n.d.) These metrics provide valuable insights into the operational and financial performance of airlines and are crucial for predicting shareholder returns
• Passenger Revenue per Available Seat Mile (PRASM):
PRASM measures the passenger revenue generated per seat, whether empty or full, flown over a distance of one mile It serves as a fundamental indicator of the airline's ability to generate revenue from its passenger operations In this report, PRASM is expressed in cent/mile
• Revenue per Available Seat Mile (RASM):
RASM represents the total operating revenue earned per seat, whether empty or full, flown over a distance of one mile It provides a comprehensive view of the airline's revenue-generating capabilities across all operations Calculated as Total Operating Revenues/Available Seat Miles, RASM is denoted in cent/mile in this report
• Operating Cost per Available Seat Mile (CASM):
CASM measures the average cost incurred by the airline for flying an aircraft seat, whether empty or full, over a distance of one mile It reflects the airline's operational efficiency and cost management practices CASM is presented in cent/mile in this report
• Load Factor:
Load factor indicates the percentage of a plane filled with paying passengers, representing the airline's capacity utilization A higher load factor generally implies better revenue utilization and operational efficiency Load factor is expressed as a percentage (%) and retrieved as decimal numbers rounded to the second decimal place in this report
Trang 101.1.4 Common metrics
This study explores several common metrics essential for evaluating the financial performance and operational efficiency of airlines, obtained from their financial reports These metrics, along with their calculation methods, offer valuable insights into the profitability, liquidity, and leverage of airline companies
• Gross Profit Margin:
Gross profit margin serves as a key profitability metric, showing the percentage of revenue remaining after deducting the cost of goods sold from total operating revenue It offers analysts and investors a clear view of a company's financial health In this report, gross profit margin
is calculated as Total Operating Revenue minus Total Operating Expenses, divided by Total Operating Revenue (Bellucco, 2023)
Gross profit margin = Total operating revenue − Total operating expenses
Total operating revenue
• Quick Ratio:
The quick ratio assesses a company's ability to quickly convert its liquid assets into cash
to meet short-term financial obligations It excludes inventory from current assets to provide a conservative measure of liquidity In this report, the quick ratio is calculated as Quick Assets divided by Current Liabilities (Beaver, 2022)
𝑄𝑢𝑖𝑐𝑘 𝑟𝑎𝑡𝑖𝑜 = 𝑄𝑢𝑖𝑐𝑘 𝑎𝑠𝑠𝑒𝑡𝑠
𝐶𝑢𝑟𝑟𝑒𝑛𝑡 𝑙𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑖𝑒𝑠Where Quick Assets is derived from Current Assets by subtracting Inventory and Prepaid Expenses
• Debt-to-Equity Ratio:
The debt-to-equity ratio (D/E ratio) indicates the proportion of debt relative to shareholder's equity, reflecting a company's leverage It is calculated by dividing Total Liabilities by Shareholder's Equity In this report, Total Liabilities is computed as Total Current Liabilities minus Total Noncurrent Liabilities (Debt To Equity Ratio - Definition, Formula & How to Calculate DE Ratio?, n.d.)
𝐷/𝐸 𝑟𝑎𝑡𝑖𝑜 = 𝑇𝑜𝑡𝑎𝑙 𝐿𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑖𝑒𝑠
𝑆ℎ𝑎𝑟𝑒ℎ𝑜𝑙𝑑𝑒𝑟’𝑠 𝐸𝑞𝑢𝑖𝑡𝑦Where total liabilities is calculated using the following formula
𝑇𝑜𝑡𝑎𝑙 𝐿𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑖𝑒𝑠 = 𝑇𝑜𝑡𝑎𝑙 𝑐𝑢𝑟𝑟𝑒𝑛𝑡 𝑙𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑖𝑒𝑠 – 𝑇𝑜𝑡𝑎𝑙 𝑛𝑜𝑛𝑐𝑢𝑟𝑟𝑒𝑛𝑡 𝑙𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑖𝑒𝑠
Trang 11• Return on Assets (ROA):
ROA measures a company's efficiency in utilizing its assets to generate profits, providing insight into its operational effectiveness It is calculated as Net Income divided by Total Assets ROA facilitates comparisons across companies within the same sector or industry (Eichler, 2022)
𝑅𝑂𝐴 = 𝑁𝑒𝑡 𝐼𝑛𝑐𝑜𝑚𝑒
𝑇𝑜𝑡𝑎𝑙 𝐴𝑠𝑠𝑒𝑡𝑠
• Earnings per Share (EPS):
EPS reflects a company's profitability by dividing its profit by the outstanding shares of its common stock It is widely used to estimate corporate value and indicates the amount of money earned per share of stock In this report, basic EPS is considered, retrieved instead of its diluted value (Fernando & Kindness, 2022)
1.2.Research Problem
While machine learning models have been extensively explored in the airline industry for various applications such as predicting ticket prices and flight delays, there is a noticeable gap
in the literature concerning the prediction of shareholder returns using these models
Financial ratios serve as valuable tools for analyzing a firm's performance over time, providing insights into outcomes of decisions made by the firm and external conditions affecting it However, there is a lack of research focusing on the efficiency of machine learning models in predicting shareholder returns in the airline industry, as well as identifying crucial financial indicators affecting stock performance, particularly Total Shareholder Returns (TSR) and Earnings per share (EPS)
This research project aims to fill this gap by examining the effectiveness of several machine learning models in predicting shareholder returns in the airline industry By identifying significant financial ratios that impact TSR, the study aims to assist shareholders in making informed decisions regarding airline stock investments
1.3.Research Objectives and Scope
This study was practiced with the following 2 main objectives:
(1) Identify the essential metrics in financial statements of the airline industry
(2) Explore different machine learning algorithms that can bring out reliable prediction models for the values of TSR in the airline industry, and metrics that have the most effect on TSR Sector Scope: The determinants for forecasting Total Shareholder Returns (TSR) will encompass a range of metrics selected from the Uniform Bank Performance Report, as
Trang 12presented above These metrics include both featured and common metrics, such as Passenger Revenue per Available Seat Mile (PRASM), Revenue per Available Seat Mile (RASM), Operating Cost per Available Seat Mile (CASM), Load Factor, Gross Profit Margin, Quick Ratio, Debt-to-Equity Ratio, Return on Assets (ROA), and Earnings per Share (EPS) These metrics provide crucial insights into the financial health, operational efficiency, and profitability of airline companies, allowing for a comprehensive analysis to forecast TSR accurately
Types of airlines: 9 largest airline companies in the world according to Business Insider and Forbes’ latest annual listing of the top airline companies
Time frame: Data is quarterly collected from 2013 to 2023 (11 years)
1.4.Research Methods
Qualitative: We study key financial ratios for the airline industry and the machine learning techniques used to process panel data
Quantitative: We collect and analyze data of some airline companies in the world (mostly
in the U.S) We obtain 11 years of data in terms of the chosen key metrics (2013-2023) and their stock prices on a quarterly basis From there we use various machine learning techniques
to predict the value of TSR
1.5.Research Structure
• Chapter 1: Introduction
• Chapter 2: Literature Review
• Chapter 3: Research Methodology
• Chapter 4: Data Analysis, Presentation, and Interpretation
• Chapter 5: Summary, Conclusion, and Recommendation
2 Literature Review
According to the International Air Transport Association (2021), the economic performance of the airline industry post-COVID-19 presents a nuanced landscape for predicting shareholder returns While air cargo has rebounded above pre-crisis levels, passenger demand,
as indicated by Revenue Passenger Kilometers (RPKs), remains subdued at 40% of pandemic levels in 2021 Vaccination efforts offer hope for a gradual recovery, with forecasts predicting a return to 61% of 2019 levels by 2022 However, substantial net losses of $52 billion
pre-in 2021, expected to reduce to $12 billion pre-in 2022, underscore ongopre-ing challenges Understanding these economic dynamics is crucial for developing accurate machine learning models to forecast shareholder returns in the airline industry
The Efficient Market Hypothesis (EMH), posited by Fama and Samuelson in the 1960s, suggests that securities are efficiently priced, leaving no room for arbitrage opportunities However, the Arbitrage Pricing Theory (APT) contradicts this, indicating that undervalued
Trang 13stocks may exist Pontiff and Schall (1998) demonstrated the predictive power of financial ratios, suggesting market inefficiencies In the airline industry, metrics like P/E ratio and DY are commonly analyzed, but other ratios such as inventory turnover and debt ratio may also influence stock returns This study aims to assess the predictive ability of financial ratios, challenging the semi-strong form of the EMH Previous research, like that of Fama and French (1988) and Lewellen (2004), has shown varying degrees of significance for ratios like DY and P/E in forecasting stock returns
According to Barack Wamkaya Wanjawa and Lawrence Muchemi (2014), Artificial Neural Networks (ANN) offer a promising approach for predicting stock prices, potentially surpassing traditional methods like technical or fundamental analysis These models leverage parallel computation and learning algorithms to analyze historical data and extract patterns for future price predictions Studies have shown that ANN models can achieve high prediction accuracy with low Mean Absolute Percentage Error (MAPE) and Root Mean Square Error (RMSE), indicating their potential applicability in real-world stock markets
Autoregressive Moving Average (ARIMA) modeling, as discussed in "Autoregressive Moving Average Modeling in the Financial Sector" by Peihao Li et al (2015), has emerged as
a valuable tool for forecasting stock prices in the financial sector By incorporating factors like stock-trading volume and exchange rates, ARIMA models offer a flexible approach to predicting shareholder returns in the airline industry These models provide insights into complex dynamics and have shown promising estimation performance, enhancing decision-making processes for investors
The study "Predicting Future Earnings Changes Using Machine Learning and Detailed Financial Data" by Xi Chen, Yang Ha (Tony) Cho, Yiwei Dou, and Baruch Lev, in February
2022, demonstrates the significant potential of machine learning algorithms applied to detailed financial data for predicting future earnings changes By employing ensemble learning methods such as random forests and stochastic gradient boosting, the models developed exhibit robust out-of-sample predictive power, with the area under the Receiver Operating Characteristics (ROC) curve (AUC) ranging from 67.52 to 68.66 percent This performance surpasses that of conventional models utilizing logistic regressions and small sets of accounting variables, as well as professional analysts' forecasts The superiority of machine learning models is attributed
to their ability to capture nonlinear predictor interactions missed by traditional regressions and their utilization of more comprehensive financial data Overall, the findings underscore the value of leveraging machine learning techniques and detailed financial data for forecasting future earnings changes
Various studies have demonstrated the effectiveness of LSTM and BI-LSTM models in forecasting stock prices accurately by capturing the nonlinear patterns present in large datasets Specifically, the research by Istiake Sunny et al (2020) emphasizes the potential of LSTM and BI-LSTM models in predicting stock trends with high accuracy through proper hyper-parameter tuning This comparison between LSTM and BI-LSTM models provides insights into selecting the most suitable model for accurately predicting shareholder returns in the airline industry, considering factors such as model complexity and computational efficiency
Trang 14The study by Halteh, Khaled et al (2024) pioneers the application of machine learning techniques to predict financial distress in the aviation industry The findings highlight Random Forests (RFs) and Stochastic Gradient Boosting (SGB) models as the most accurate in predicting financial distress, emphasizing the importance of company-specific factors such as debt-to-equity ratio, return on invested capital, and debt ratio in determining financial health (Halteh, Khaled et al., 2024) By incorporating machine learning-driven early warning systems and dynamic risk assessment into policymaking and crisis management strategies, the study underscores the importance of data-driven approaches in ensuring industry sustainability The prediction of shareholder returns in the airline industry relies heavily on understanding the factors influencing airline profitability Choi, O'Connor, and Truong (2019) highlight the significance of revenue and cost structures in determining operating margins for airlines, particularly in the context of intense competition and fuel price volatility Given the industry's narrow profit margins, internal and external factors such as fuel prices, labor costs, passenger and cargo businesses, and ancillary services significantly impact airline profitability (Miranda, 2015)
Traditionally, airline profitability analysis has focused on factors like technical efficiency, total factor productivity, and cost competitiveness (Yu, 2016) However, predicting airline profitability remains challenging due to the dynamic nature of the industry and the multitude of factors influencing financial performance While previous studies have examined various aspects of airline operations and financial performance, the prediction of profitability with precision has remained elusive (Choi, O'Connor, & Truong, 2019)
The liberalization of the airline industry in the 1980s and 1990s ushered in a new era of competition and business models, significantly impacting airline profitability (Scotti & Volta, 2017) With increased competition and structural vulnerability to external shocks, such as economic downturns or fuel price fluctuations, airline profitability became a major concern within the industry (Brugnoli et al., 2015) Consequently, understanding the complex relationship between operations revenue, expenses, and profitability became imperative for airlines to navigate these challenges effectively (Scotti & Volta, 2017)
Choi, O'Connor, and Truong (2019) address the gap in predicting airline profitability by developing decision tree and logistic regression models to forecast potential profit and loss for major U.S airlines Their study underscores the importance of transportation-related revenue and expenses (TRR and TRE) as significant predictors of profitability Furthermore, the study highlights the superiority of decision tree models over regression models in predicting profitability, emphasizing the need for robust predictive analytics tools in the airline industry (Choi, O'Connor, & Truong, 2019)
3 Data & Research Methodology
3.1.Overview
This study used quantitative methods, analysing the annual data of the 9 largest airline companies in the world according to Business Insider and Forbes’ latest annual listing of the