Machine learning models to predict shareholder returns in the airline industry Mô hình học máy để dự đoán lợi nhuận của cổ đông trong ngành hàng không
INTRODUCTION
The necessity of topic
The aviation industry is crucial for global connectivity, fostering economic growth and improving societal mobility Despite its importance, airlines struggle to provide consistent shareholder returns, as highlighted in Tony Tyler's IATA report, "Profitability and the Air Transport Value Chain." This report underscores the disparity between the rapid growth of air travel and cargo and the airlines' modest profitability, revealing the persistent challenges in effectively managing shareholder returns in the aviation sector.
In the current data-driven landscape, machine learning presents significant opportunities for enhancing decision-making and predictive analytics The aviation industry, with its complex operations and inherent risks, stands to gain immensely from the implementation of these advanced models.
A 2013 study by the State Board of Administration highlights the critical need to align performance metrics with shareholder value, emphasizing that these metrics must be tailored to meet the specific requirements of various industries, such as aviation.
While machine learning has been extensively studied for ticket pricing and flight delay prediction, there remains a significant gap in research concerning its application for predicting shareholder returns in the aviation sector This gap highlights a valuable opportunity to develop targeted and industry-specific machine learning solutions.
Total Shareholder Return (TSR) is a crucial metric for assessing investment performance in the airline industry, encompassing both capital appreciation from stock price changes and dividends paid to shareholders This comprehensive measure provides insight into the overall returns from owning a company's stock Given the airline sector's volatility and the substantial influence of external factors like fuel prices and economic conditions on stock performance, TSR becomes particularly vital for investors in this industry.
This study highlights Total Shareholder Return (TSR) as a vital metric for evaluating the performance of machine learning models in predicting shareholder returns within the airline industry By utilizing TSR data from reputable financial sources such as Yahoo Finance, the research investigates the predictive capabilities of these models and their implications for investment strategies The reliance on trustworthy financial data guarantees that the analysis is based on precise and current information, which is crucial for creating effective predictive models.
Total Shareholder Return (TSR) is a crucial metric for investors, reflecting the impact of stock price changes and dividend yields A positive TSR signifies value creation and enhanced investment returns, while a negative TSR suggests a decline in shareholder value, potentially indicating problems within the airline sector Analyzing TSR allows investors to understand both the short-term and long-term performance of their investments, enabling them to make more informed investment decisions.
Utilizing TSR data from platforms like Yahoo Finance provides researchers with essential financial information for in-depth analysis This study seeks to enhance the understanding of factors affecting shareholder returns within the airline industry by employing advanced predictive modeling techniques By integrating machine learning methods such as regression analysis, decision trees, and neural networks, the research aims to comprehensively evaluate the drivers of TSR, thereby improving the accuracy and usefulness of financial forecasts in the airline sector.
The objective of topic
This thesis rigorously examines the impact of specific financial ratios and relevant metrics on Total Shareholder Return (TSR) within the airline industry By gathering annual financial data from leading global airlines, the study calculates these metrics and analyzes their correlation with TSR This data forms the basis for a comprehensive multiple regression model that evaluates the influence of each metric on TSR.
Research outcomes
The study meticulously analyzes data from nine prominent U.S airlines, focusing on Total Shareholder Returns (TSR) and a combination of specific airline industry metrics alongside conventional corporate finance metrics
This research utilizes correlation calculations and a multiple linear regression model to identify both positive and negative linear relationships between various metrics and Total Shareholder Return (TSR) The findings aim to provide valuable insights into how these relationships impact stock returns.
Practical contributions
This research provides actionable insights for airline companies, particularly in Vietnam, to improve their financial and operational strategy disclosure By highlighting the impact of key metrics on Total Shareholder Return (TSR), these insights aim to enhance investment attraction and enable airlines to effectively manage and report their financial performance, ultimately maximizing shareholder value.
LITERATURE REVIEW AND RESEARCH
Literature review
The performance of the airline industry is significantly shaped by various financial and operational metrics, as evidenced by multiple sources A study by Farient Advisors LLC, commissioned by the State Board of Administration (SBA), demonstrates the connection between executive compensation and shareholder value, highlighting the importance of earnings growth, returns, and revenue growth on stock prices This supports economic theories that indicate well-aligned performance metrics lead to improved shareholder returns.
Industry experts Gary Leff and Ted Reed emphasize the market's strong emphasis on unit revenue metrics such as PRASM and RASM, despite a noticeable gap between these metrics and actual stock performance, as observed by industry executives This indicates a complex relationship between financial metrics and market perceptions that ultimately affect stock prices.
The adoption of machine learning models, such as Artificial Neural Networks (ANNs), Random Forest, and Support Vector Regression (SVR), offers significant advantages in navigating the complexities of financial data These advanced models excel at uncovering nonlinear relationships and subtle intricacies that traditional economic models often miss By efficiently processing large datasets, they identify patterns that human analysts might overlook, ultimately enhancing the accuracy of forecasts for shareholder returns.
The integration of machine learning models in financial analysis is supported by key economic theories, including the Efficient Market Hypothesis (EMH) and the Arbitrage Pricing Theory (APT) EMH suggests that stock prices reflect all available information, making it challenging to achieve above-average returns without insights from machine learning (Fama, 1965) In contrast, APT offers a multifactorial approach to analyzing stock returns, which aligns with machine learning's ability to evaluate multiple variables simultaneously (Ross, 1976) These theories not only validate the use of machine learning but also enhance its effectiveness in financial analysis.
In summary, the theoretical foundation for employing machine learning to forecast shareholder returns in the airline industry merges economic theories with insights from industry research and expert opinions By utilizing advanced analytics and thorough data analysis, stakeholders can enhance their understanding of the elements influencing stock performance in the airline sector, facilitating more informed decision-making for sustainable investment strategies.
TSR is defined as the total return a stock provides to an investor, encompassing both capital gains and dividends received The formula for calculating TSR is:
For this study, the stock prices utilized are the adjusted closing prices, which already account for dividends These prices were sourced from Yahoo Finance
Available seat miles (ASM) or available seat kilometers (ASK) are crucial metrics in the airline industry, measuring the total passenger capacity offered by an airline This metric calculates the total number of seat miles or kilometers available for sale across all flights within a specific timeframe, regardless of seat occupancy Ultimately, ASM/ASK effectively represents the overall supply of passenger carrying capacity provided by an airline.
Revenue Passenger Miles (RPM) or Revenue Passenger Kilometers (RPK) are essential metrics for assessing airline passenger traffic, as they quantify the total distance flown by paying passengers This measurement reflects the effectiveness of an airline's capacity utilization by indicating the volume of passenger traffic converted into revenue.
1 Load Factor (%): This indicator shows the percentage of Available Seat Miles (ASMs) that are filled with paying passengers, known as Revenue Passenger Miles (RPMs) It evaluates how much of the airline's capacity is actively utilized by passengers, whether on a single flight or system-wide.
2 Revenue per ASM (RASM): Also known as "unit revenue," this metric is calculated by dividing the airline’s total revenue by its total available seat miles It measures how effectively an airline generates revenue per unit of capacity
3 Passenger Revenue per ASM (PRASM) Often termed as passenger "unit revenue," this is calculated by dividing the revenue from passenger ticket sales by the available seat miles Typically expressed in cents per mile, it assesses the revenue efficiency from passenger services
4 Cost per ASM (CASM) This measure indicates the cost per unit of capacity, calculated by dividing all of an airline's operating expenses by the total available seat miles Excluding fuel or specific transport costs can provide a clearer view of operational efficiencies
This study examines key metrics for assessing the financial performance and operational efficiency of airlines, derived from their financial reports These metrics and their calculation methods provide crucial insights into the profitability, liquidity, and leverage of airline companies.
5 Gross margin: This metric illustrates the profitability of an airline by showing the percentage of revenue that exceeds the cost of goods sold (COGS) It is determined by subtracting COGS from net sales and then dividing by net sales A higher gross margin indicates more efficient control over production costs relative to revenue
6 Return on assets (ROA): This measure assesses how efficiently an airline utilizes its assets to produce earnings It is computed by dividing net income by total assets An elevated ROA suggests a higher efficiency in converting investments into profits
7 Quick Ratio: This liquidity measure assesses an airline's capacity to cover its short-term obligations with its most liquid assets, excluding inventory It is computed by dividing liquid assets by current liabilities A higher value indicates stronger short-term financial health
Research problem, methodology and scope
Despite the extensive exploration of machine learning models in the airline industry for predicting ticket prices and flight delays, there remains a significant gap in academic literature concerning their application in forecasting shareholder returns This oversight is particularly important, as these models hold the potential to significantly improve investment strategies and financial planning.
Financial ratios are essential analytical tools that reveal a company's performance trends, highlighting the impact of management decisions and external economic factors However, there is limited research on the application of machine learning models to utilize these ratios for predicting shareholder returns, specifically regarding key financial metrics like Total Shareholder Returns (TSR) and Earnings per Share (EPS) within the airline industry.
This research project evaluates the effectiveness of various machine learning models in predicting shareholder returns in the airline sector The study aims to identify key financial ratios that significantly influence Total Shareholder Return (TSR), providing insights into the factors that enhance shareholder value in airline stocks By highlighting these critical ratios, the research intends to equip shareholders with data-driven insights, facilitating more informed investment decisions in the airline industry.
This project aims to develop a strong predictive model for the airline market by integrating historical financial data with advanced machine-learning techniques The goal is to generate actionable forecasts that can significantly impact investment portfolios.
This study aims to address the research gap in financial analytics within the airline industry, enhancing the predictive accuracy of shareholder returns By providing sophisticated, model-driven predictions, it seeks to support investors in optimizing their investment strategies effectively.
• Qualitative: We study key financial ratios for the airline industry and the machine learning techniques used to process panel data
• Quantitative: We collect and analyze data of some airline companies in the U.S We obtain 17 years of data in terms of the chosen key metrics (2007-
2023) and their stock prices on a yearly basis From there we use various machine learning techniques to predict the value of TSR
The analysis of correlations between Total Shareholder Return (TSR) and key airline metrics is performed using original currency values from financial statements to avoid distortions from exchange rate fluctuations For constructing the multiple regression model, all financial metrics, including currency values, are converted to USD, utilizing the exchange rates indicated in the financial statements or those relevant at the fiscal year-end This method ensures consistency across the dataset, enabling accurate comparative analysis.
This study was practiced with the following 2 main objectives:
(1) Identify the essential metrics in financial statements of the airline industry
Investigate various machine learning algorithms that can develop accurate prediction models for TSR values in the airline industry, focusing on the key metrics that significantly influence TSR outcomes.
To accurately forecast Total Shareholder Returns (TSR) for airline companies, a comprehensive analysis will utilize various metrics from the Uniform Bank Performance Report Key indicators include Passenger Revenue per Available Seat Mile (PRASM), Revenue per Available Seat Mile (RASM), Operating Cost per Available Seat Mile (CASM), Load Factor, Gross Profit Margin, Quick Ratio, Debt-to-Equity Ratio, Return on Assets (ROA), and Earnings per Share (EPS) These metrics are essential for assessing financial health, operational efficiency, and profitability, thereby enhancing the accuracy of TSR predictions.
Types of airlines: 9 largest airline companies in the world according to Business Insider and Forbes’ latest annual listing of the top airline companies
Time frame: Data is yearly collected from 2007 to 2023 (17 years).
Research Challenges and Resources
The study utilized a comprehensive dataset spanning 17 years (2007-2023), derived from the annual reports of nine major airlines and their stock prices obtained from Yahoo Finance This extensive data forms the foundation for analyzing the effectiveness of machine learning models in predicting Total Shareholder Return (TSR).
Despite extensive data collection, challenges arose due to some airlines lacking comprehensive data over the 17-year period, which may affect the analysis's depth Furthermore, certain airlines failed to provide essential information needed to calculate key metrics, hindering the creation of a uniform dataset for comparison These gaps highlight the need for careful consideration and adaptation in the study's methodology to maintain the accuracy and reliability of the findings.
MAIN RESULTS
Overview
This study employed quantitative methods to analyze the annual data of the world's nine largest airline companies, as identified by Business Insider and Forbes Data spanning 17 years, from 2007 to 2023, was gathered from SEC filings and stock prices sourced from Yahoo Finance Utilizing Python, the researcher examined various prediction models for Total Shareholder Return (TSR), focusing on its correlation with other financial metrics reported by the airlines.
Data Collection
This research gathered data from the nine largest airline companies globally, as ranked by the latest annual listings from Business Insider and Forbes The information was sourced from a comprehensive list of airline stocks available on the NYSE.
There are two main criteria to select, which are:
• The airline company published its SEC fillings from 2007 to 2023
• The airline company is listed in the U.S stock exchange, with all necessary data for calculating Total Shareholder Return (TSR) available on Yahoo Finance
The dataset of airline companies' financial metrics is compiled through a meticulous process involving data collection and preprocessing Initially, SEC filings are examined to extract annual financial data from 2013 to 2023 For missing metrics, alternative data sources are utilized to calculate necessary figures using established formulas The data is organized in Google spreadsheets, with each company having its own sheet sorted by year This data is then imported into pandas data frames for cleaning, filling missing values, and ensuring consistency After preprocessing, the cleaned data is exported to individual xlsx or CSV files for easy access and manipulation Finally, these files are consolidated into a single xlsx file for collinearity checks and model fitting, preparing the dataset for advanced analysis and modeling techniques.
The study refines the dataset by excluding US_SkyWest, US_SunCountry, and US_Frontier Airlines for failing to meet specific criteria essential for data integrity and relevance to the research objectives Initially featuring 12 major US-listed airline companies, the dataset undergoes careful curation US_SkyWest is removed due to insufficient SEC filings, lacking key financial metrics like PRASM, RASM, and CASM, which could not be sourced from alternative data Similarly, US_SunCountry is excluded because its SEC filings only cover 2018, not meeting the required data range from 2013 to present.
In 2023, the significant lack of comprehensive data makes it impractical to extrapolate or interpolate missing values, risking the reliability of the dataset US Frontier Airlines has been excluded due to insufficient coverage of key financial metrics, particularly the absence of PRASM, RASM, and CASM data since 2017 Despite attempts to use relevant proxies to fill these gaps, the inability to accurately assess financial performance metrics prevents its inclusion in the final dataset This careful elimination process, based on data completeness and relevance, enhances the dataset for robust analysis and informed decision-making in the aviation sector.
Airline companies, particularly regional carriers operating under major brands, often lack essential airline-specific financial metrics such as PRASM (Passenger Revenue per Available Seat Mile), CASM (Cost per Available Seat Mile), and RASM (Revenue per Available Seat Mile) This discrepancy arises from differences in financial reporting structures compared to larger, standalone airlines, leading to these metrics being less emphasized or even inapplicable based on their operational nature.
Discrepancies in reporting standards and disclosure requirements between regional and major airlines can result in variations in financial data availability Limited resources, including personnel and technology, may also hinder these companies' ability to compile comprehensive financial metrics Additionally, the aviation industry's complexity, characterized by diverse operational models and contractual arrangements, poses challenges in accurately calculating and reporting financial indicators Consequently, these factors contribute to data gaps for airlines Overall, operational nuances, resource constraints, and industry complexities collectively explain the lack of essential airline-specific financial data.
The name of each airline company in the dataset is saved along with their based country in the list mentioned above Here are the final list:
All data is stored in a spreadsheet file as below
Figure 3.1 A part of the final dataset used for building prediction models.
Data Analysis
The correlation matrix reveals that Total Shareholder Return (TSR) has very weak correlations with other financial metrics However, it also emphasizes strong correlations among several key metrics, particularly with PRASM.
Passenger Revenue per Available Seat Mile (RASM), Revenue per Available Seat Mile (RASM), Cost per Available Seat Mile (CASM), and Load Factor exhibit strong correlation values between 0.56 and 0.78 This high degree of multicollinearity may affect the performance and reliability of machine learning models that incorporate these metrics.
3.3.2.1 Overview of handling missing values
Effectively managing missing values is essential in financial data analysis, particularly in time series data, as gaps can compromise the reliability of predictive models Daily disruptions like holidays, documentation errors, and lost records highlight the need for robust strategies to address missing data.
When dealing with significant amounts of missing data, one can either eliminate entire variables or fill in missing entries using methods like zeros, means, or medians However, replacing missing values with zeros or averages may not be the best approach and can negatively impact the dataset's overall integrity.
Advanced statistical models provide sophisticated solutions for data imputation, emphasizing the importance of understanding dataset attributes to select the right method For datasets with low entropy, characterized by minimal error and low variability, global techniques such as svdPCA and bPCA are recommended In contrast, local techniques like llsPCA and K-Nearest Neighbor (KNN) are more effective for high entropy datasets, which exhibit significant variance and contain a higher degree of information and noise.
The choice of imputation method is significantly affected by the percentage of missing data in a dataset According to Jones and Silver (2019), datasets with less than 20% missing data are often deemed sufficient for analysis, as minor amounts of missing information may not substantially impact the overall results, thus reducing the need for extensive imputation.
In this analysis, the investigator employs imputation strategies such as mean, mode, and zero filling to address missing data Specifically, the mean or mode is used for columns with a significant number of missing entries, while zero values are applied to columns where zero is considered a valid placeholder in similar scientific studies.
Collinearity, or multicollinearity, refers to the high correlation or linear dependence between two or more independent variables in a statistical model This phenomenon can severely obstruct the assessment of each predictor's individual effect on the dependent variable, especially in regression and predictive modeling contexts.
To tackle the issue of multicollinearity among variables, the authors employed a heatmap for analysis Their findings led to the exclusion of specific variables prior to the feature selection stage.
The metrics RASM (Revenue per Available Seat Mile), CASM (Cost per Available Seat Mile), and Load Factor were excluded from the analysis due to their derivation from similar sub metrics as PRASM (Passenger Revenue per Available Seat Mile), which includes Available Seat Mile and Passenger Revenue Furthermore, RASM and CASM can be calculated using the Load Factor, resulting in redundancy among these metrics.
After excluding certain metrics, the analysis now focuses on seven key metrics alongside Total Shareholder Return (TSR) This simplification reduces multicollinearity and enhances the model's predictive accuracy.
This study aims to develop a strong predictive model for Total Shareholder Return (TSR), treating TSR as the dependent variable while utilizing nine metrics derived from SEC filings as independent variables.
The study utilizes six unique machine learning models to forecast TSR, aiming to evaluate and compare their predictive accuracy The models and their corresponding abbreviations are detailed below.
Each model will be rigorously assessed to determine its accuracy in predicting TSR, offering essential insights that will improve decision-making through the empirical findings derived from these predictive analyses.
This study employs four metrics to assess model performance Two primary metrics are used to evaluate machine learning models: R-squared (measuring model fit) and MSE (mean squared error)
R-squared (R²), or the coefficient of determination, reflects the proportion of variance in the dependent variable that the independent variables explain within a model When the R-squared value is higher, it indicates that the model effectively captures a substantial portion of the variability in the response variable, suggesting a strong model fit However, R-squared alone may not fully represent model efficacy as it does not directly consider error magnitude
Results
The table below presents essential statistical measures for the dataset, highlighting the number of observations, mean, standard deviation, minimum, and maximum values Notably, all variables in the dataset are uniformly represented in terms of the number of observations.
The dataset summary, illustrated in Figure 3.3, includes 149 entries and 12 columns, with each row reflecting data from various airline companies for specific years Key attributes encompass company name, year, and financial metrics, all of which show no missing values, ensuring data integrity Data types are well-defined, featuring int64 for the Year and float64 for metrics like Total Shareholder Return (TSR), Passenger Revenue per Available Seat Mile (PRASM), and others However, the Cost per Available Seat Mile (CASM) column is categorized as an object data type, necessitating further review for consistency This comprehensive table offers vital insights into the dataset's structure, aiding in the analysis of airline industry performance.
Figure 3.4 The descriptive analysis table
The descriptive analysis table reveals key financial metrics for the airline industry over various years The average Total Shareholder Return (TSR) is 8.2%, while Passenger Revenue per Available Seat Mile (PRASM) averages 11.65 cents The Gross Profit Margin stands at 9.9%, indicating narrow profit margins A concerning Quick Ratio average of -0.36 suggests limited liquid assets to meet immediate liabilities, with notable variability indicating potential outliers The average Debt-to-Equity (D/E) ratio of -0.44 reflects significant reliance on debt financing Return on Assets (ROA) averages 3.3%, showing moderate profitability, and Earnings per Share (EPS) average 1.04, with considerable year-to-year variability This comprehensive overview of financial metrics aids in further analysis and decision-making within the airline sector.
3.4.2 Machine Learning for Prediction Models
Table 3.1 Results from machine learning models for TSR
Model R-squared MSE (Mean Squared Error)
Random Forest is a robust ensemble learning technique in machine learning, known for its effectiveness in both classification and regression tasks It operates by creating numerous decision trees during training and combining their results to predict outcomes—using the mode for classification and the average for regression By randomly selecting subsets of training data and features for each tree, Random Forest promotes model diversity, which is essential for minimizing the risk of overfitting, a common challenge in machine learning models.
Random Forest is highly resilient against overfitting, making it more reliable than linear models like multiple linear regression and support vector machines This resilience is crucial for ensuring prediction accuracy with new, unseen data and reducing prediction errors Additionally, Random Forest effectively manages the complexities of nonlinear data, making it suitable for solving intricate problems involving multiple variables with varying importance and interactions Its computational efficiency and lower parameter requirements compared to models like neural networks streamline the setup process, reducing user input and accelerating model development while enhancing prediction accuracy.
Random Forest is a powerful tool for predicting Total Shareholder Return (TSR) in the airline industry due to its robustness in generalizing to new data, which is essential given the complex datasets and multiple factors influencing TSR Its ability to manage non-linear relationships and variable interactions enables a thorough analysis of operational, economic, and market influences on TSR The ensemble approach of Random Forest enhances both accuracy and stability, critical for precise TSR forecasts in the volatile airline market, aiding informed decision-making Additionally, the minimal preprocessing and parameter tuning required make Random Forest a practical and efficient choice for TSR prediction in the airline sector.
Figure 3.5 Result summary for Random Forest Regression model
R-squared (R²): Known as the coefficient of determination, R-squared measures the amount of variance in the dependent variable (Total Shareholder Return, TSR) that can be explained by the independent variables (PRASM, Gross Profit Margin, Quick Ratio, Debt-to-Equity Ratio, Return on Assets, Earnings per Share) within the Random Forest model An R-squared value close to 1 implies that a substantial portion of the variance in TSR is captured by the independent variables, indicating an excellent model fit For instance, an R-squared of 0.988 suggests that 98.8% of TSR variability is explained by the model's selected variables
Mean Absolute Error (MAE) is a key metric that assesses the average absolute differences between predicted and actual TSR values, offering valuable insight into the model's prediction accuracy A lower MAE indicates greater accuracy, suggesting that the model's forecasts align closely with actual figures With an MAE of 0.01938, the model's predictions deviate from the actual TSR values by roughly 0.01938 units on average.
Mean Squared Error (MSE) is a metric that measures the average of the squared differences between predicted and actual TSR values, highlighting larger discrepancies and making it sensitive to outliers A lower MSE indicates superior model performance, reflecting a closer fit between predicted and actual outcomes For instance, an MSE of 0.00115 signifies that the average squared difference between the model's predictions and the actual TSR values is 0.00115 units.
Figure 3.6 Actual versus Predicted Result for Random Forrest model
The Random Forest regression model demonstrates outstanding accuracy in predicting Total Shareholder Return (TSR) based on key independent variables such as PRASM, Gross Profit Margin, Quick Ratio, Debt-to-Equity Ratio, Return on Assets, and Earnings per Share With an impressive R-squared value of 0.988, the model accounts for approximately 98.8% of the variability in TSR, indicating a strong correlation with the dataset The Mean Absolute Error (MAE) of 0.01938 reveals that predictions closely match actual TSR values, while the Mean Squared Error (MSE) of 0.00115 highlights minimal average squared differences between forecasted and actual values These metrics collectively affirm that the Random Forest model effectively captures the relationship between independent variables and TSR, resulting in highly accurate predictions with very low error rates.
AdaBoost is a powerful algorithm in ensemble learning that enhances predictive accuracy through its iterative approach, focusing on error correction and adaptability It effectively trains weak classifiers—models that outperform random guessing—across various data segments With each iteration, AdaBoost adjusts the weights of misclassified samples, honing in on more difficult data points This method improves the performance of individual classifiers and strengthens the overall ensemble's capacity to address complex scenarios.
AdaBoost's effectiveness stems from its adaptive learning process, which adjusts training focus as new challenges emerge, leading to ongoing refinement Each classifier within the ensemble offers a distinct perspective, and their combination results in a powerful model that delivers highly accurate predictions This integration of diverse classifiers enables AdaBoost to minimize error margins and improve predictive performance across multiple domains.
In the dynamic airline industry, where fluctuating fuel prices, changing market demands, and varying operational efficiencies significantly influence Total Shareholder Returns (TSR), AdaBoost proves to be an invaluable tool Its ability to integrate and learn from a variety of weak learners enables it to effectively capture the complex relationships among these factors Additionally, AdaBoost's focus on correcting misclassifications allows the model to remain responsive to subtle data shifts, providing the necessary flexibility to adapt to unexpected market changes and operational challenges.
AdaBoost excels in processing both structured and unstructured data, enabling airline companies to analyze a diverse range of information, including quantitative financial metrics and qualitative performance indicators This ability to integrate comprehensive data allows for a thorough examination of the various factors affecting Total Shareholder Return (TSR) By utilizing AdaBoost's advanced learning techniques and adaptability, airlines can effectively manage industry complexities, leading to informed strategic decisions that enhance shareholder value.
The choice of AdaBoost for forecasting TSR in the airline industry is validated by its advanced algorithmic capabilities, which effectively handle the sector's complexities and adapt to changing circumstances This positions AdaBoost as not merely a predictive tool, but a vital strategic asset for airline operations and financial planning.
Figure 3.8 Summary result for AdaBoost model
R-squared (R²): The AdaBoost regression model exhibits an R-squared value of
CONCLUSION, IMPLICATION AND RECOMMENDATION
Conclusion
This study evaluates the effectiveness of different machine learning models in predicting Total Shareholder Return (TSR) for the top nine airlines globally, using data from SEC filings between 2007 and 2023 The research aims to identify the models that best forecast TSR by analyzing metrics such as R-squared and Mean Absolute Error (MAE) Results reveal that ensemble methods, particularly Random Forest Regression, AdaBoost, XGBoost, and Artificial Neural Networks (ANN), significantly surpassed other modeling approaches.
The Random Forest Regression model demonstrated exceptional performance with an R-squared value of 0.988, indicating it explained approximately 98.8% of the variance in TSR Similarly, the AdaBoost model showcased strong results with an R-squared of 0.9901, reflecting its significant predictive accuracy However, the XGBoost model achieved the highest R-squared of 0.994, underscoring its superior predictive capabilities.
The Artificial Neural Network (ANN) model demonstrated exceptional performance, achieving an impressive R-squared value of 0.9974, reflecting its accuracy in predictions With a Mean Absolute Error (MAE) of 0.0121 and a Mean Squared Error (MSE) of 0.0003, the ANN model showcases its capability to deliver precise predictions with minimal error.
Advanced machine learning techniques significantly enhance the accuracy of financial forecasts in the airline industry By employing these models, stakeholders can obtain valuable insights into the factors influencing shareholder returns, which supports more informed and strategic investment decisions.
The study highlights the advantages of using machine learning models such as Random Forest, AdaBoost, XGBoost, and Artificial Neural Networks (ANN) for forecasting TSR, emphasizing their effectiveness in handling complex, non-linear data relationships essential for financial analysis in the airline industry This research paves the way for future studies aimed at enhancing these models and expanding their application across different sectors.
Future research should prioritize the expansion of datasets and the optimization of model parameters to improve their generalizability and accuracy These advancements are anticipated to create more effective tools for financial decision-making, thereby strengthening the robustness of financial planning and strategy.
Recommendations for investors
Investors in the airline industry should prioritize understanding Total Shareholder Return (TSR) due to the sector's notable recovery and growth post-pandemic To manage the inherent volatility, diversifying portfolios with investments in international, domestic, and cargo airlines is recommended Additionally, attention to risk management metrics, such as provisions for loan and lease losses, is essential, as these practices, while potentially lowering expected shareholder value, reflect prudent risk management vital for the capital-intensive nature of the airline industry.
Effective management of short-term non-core assets, such as federal funds and commercial paper, is crucial for enhancing shareholder value through consistent returns Investors must stay informed about broader economic indicators and industry-specific trends, including fluctuations in fuel costs, labor relations, and regulatory changes, to anticipate shifts in profitability Analyzing how airlines have adapted their operational strategies post-pandemic—such as by expanding cargo services or altering passenger offerings—can offer valuable insights into which companies are positioned for sustainable growth.
A comprehensive analysis of long-term financial health is crucial for investors in the airline industry, focusing on debt levels, operational efficiency, and profitability metrics Additionally, staying updated on geopolitical events is vital, as they can significantly influence airline operations and profitability By adhering to these guidelines, investors can better assess potential returns relative to the risks in the airline sector, leading to more informed and strategic investment choices.
Limitation of study & recommendations for further research
The current model faces a considerable risk of overfitting due to the limited number of airline companies, a short timeframe, and small training and testing datasets, despite attempts at hyperparameter tuning and adjustments To improve prediction accuracy and enhance the model's generalization capabilities, it is essential to expand the dataset by including data from a broader range of airline companies.
Machine learning models significantly improve with larger and more diverse datasets, enhancing their ability to learn and recognize underlying patterns Incorporating data from various companies enables the model to understand a wider array of industry dynamics, ultimately boosting its accuracy in predicting outcomes across different market conditions.
Expanding the dataset is crucial for minimizing overfitting risks and improving model transparency and reproducibility A larger dataset enables thorough validation and testing, which supports continuous refinement and optimization to enhance predictive performance Therefore, gathering extensive data from diverse airline companies is vital for developing reliable and effective predictive models in the aviation sector.
Recommendations for future research
To improve forecasting in the airline industry, future research should adopt more comprehensive methodologies by integrating diverse data sources, including low-cost and regional carriers, potentially through partnerships with international aviation organizations Additionally, conducting longitudinal studies over several decades could yield valuable insights into cyclical trends and the effects of significant economic changes, facilitated by a centralized database that consistently gathers real-time data.
The integration of advanced analytical techniques, including deep learning and reinforcement learning, can enhance the precision of market dynamics and customer behavior predictions By combining traditional econometric models with modern machine learning approaches, hybrid models can harness the advantages of both methodologies Conducting comparative studies across various transportation sectors, such as maritime and rail, can reveal distinct and shared predictors of financial performance Furthermore, incorporating alternative data sources like satellite imagery and real-time weather data can lead to more effective predictions of operational disruptions.
Analyzing the effects of international aviation regulations and environmental policies is crucial for understanding their influence on profitability and sustainability This involves assessing the financial implications of regulatory changes to support policymakers and airline leaders Additionally, employing natural language processing to analyze consumer sentiment from reviews and social media can yield valuable insights into customer satisfaction and its potential impact on financial performance.
Incorporating geopolitical risk assessments into forecasting models is essential for airlines operating in unstable regions or reliant on international routes influenced by political changes Furthermore, evaluating the financial impacts of technological advancements in aviation, such as automated booking systems and advanced air traffic management, will inform future investments in emerging technologies like electric aircraft and biofuels These strategies will enhance the understanding of the complexities within the airline industry and support the development of robust models for strategic planning and forecasting.
1 Abirami, S P., Kousalya, G., Balakrishnan, B & Karthick, R (2019), "Varied expression analysis of children with ASD using multimodal deep learning technique.", Elsevier eBooks, pp 225-243 https://doi.org/10.1016/b978-0-12- 816718-2.00021-x
2 “Airline Glossary – Southwest Airlines." (n.d.) Investor Relations | Southwest Airlines Retrieved November 21, 2023, from https://www.southwestairlinesinvestorrelations.com/investor-resources/investor- faqs/airline-glossary
3 AnalytixLabs (2023), "Random Forest Regression — How it Helps in
Predictive Analytics?" Medium https://medium.com/@byanalytixlabs/random- forest-regression-how-it-helps-in-predictive-analytics-01c31897c1d4
4 Beaver, S (2022), "Quick Ratio: How to Calculate & Examples." NetSuite Retrieved November 21, 2023, from https://www.netsuite.com/portal/resource/articles/financial-management/quick- ratio.shtml
5 Bellucco, A (2023), "Gross Profit Margin: Formula and What It Tells You." Investopedia Retrieved November 21, 2023, from https://www.investopedia.com/terms/g/gross_profit_margin.asp
6 "Building a Random Forest Model: A Step-by-Step Guide." (n.d.) Analytics Vidhya Retrieved May 1, 2024, from https://www.analyticsvidhya.com/blog/2021/06/understanding-random-forest/
7 Business Insider (2019) "The 20 biggest airlines in the world, ranked." [online] Available at: https://www.businessinsider.com/biggest-airlines-world-oag-2019-
8 "Business, F S of." (n.d.) "What’s a good value for R-squared?" https://people.duke.edu/~rnau/rsquared.htm
9 "CatBoost." (n.d.) https://catboost.ai/en/docs/
10 Chicco, D., Warrens, M J & Jurman, G (2021) "The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation." PeerJ Computer Science 7: e623 https://doi.org/10.7717/peerj-cs.623
11 Corpgov.law.harvard.edu (2018) "Performance Metrics and Their Link to Value." [online] Available at: https://corpgov.law.harvard.edu/2013/02/20/performance-metrics-and-their-link- to-value/
12 Das, A (2023) "Learn With ETMarkets: What is total shareholder return?" The Economic Times Retrieved November 21, 2023, from https://economictimes.indiatimes.com/markets/stocks/news/learn-with- etmarkets-what-is-total-shareholder-return/articleshow/100894776.cms
13 "Debt To Equity Ratio - Definition, Formula & How to Calculate DE Ratio?" (n.d.) Groww Retrieved November 21, 2023, from https://groww.in/p/debt-to- equity-ratio
14 Dev, R (n.d.) "Support Vector Regression In Machine Learning." Analytics Vidhya Retrieved June 13, 2024, from https://www.analyticsvidhya.com/blog/2020/03/support-vector-regression- tutorial-for-machine-learning/
15 "Earnings Per Share (EPS): What It Means and How to Calculate It."
Investopedia Retrieved November 21, 2023, from https://www.investopedia.com/terms/e/eps.asp
The airline industry plays a crucial role in the global economy, demonstrating significant economic performance as outlined in the International Air Transport Association's report from October 2021 This report highlights key metrics and trends affecting the industry's financial health, including passenger traffic, revenue generation, and operational efficiency Understanding these factors is essential for stakeholders to navigate challenges and capitalize on growth opportunities within the sector.
17 Eichler, R (2022) "Return on Assets (ROA): Formula and 'Good' ROA
Defined." Investopedia Retrieved November 21, 2023, from https://www.investopedia.com/terms/r/returnonassets.asp
18 Fama, E F & French, K (1988) "Dividend yields and expected stock returns." Journal of Financial Economics 22(1): pp 3-25
19 Fama, E F (1965) "Proof that properly anticipated prices fluctuate randomly." Industrial Management Review 6(2): pp 41-49
20 Fama, E F (1965) "The Behavior of Stock-Market Prices." Journal of Business 38(1): pp 34-105
21 Fama, E F., Fisher, L., Jensen, M C & Roll, R (1969) "The adjustment of stock prices to new information." International Economic Review 10(1): pp 1-
22 Fernando, J & Kindness, D (2022) "Earnings Per Share (EPS): What It Means and How to Calculate It." Investopedia Retrieved November 21, 2023, from https://www.investopedia.com/terms/e/eps.asp
23 Finance.yahoo.com (2019) "Yahoo Finance." [online] Available at: https://finance.yahoo.com/
24 Forbes.com (2019) "The 2018 List Of The World's Best Airlines Is Out." [online] Available at: https://www.forbes.com/sites/ericrosen/2017/11/03/the- 2018-list-of-the-worlds-best-airlines-is-out/
25 Hancock, J F & Khoshgoftaar, T M (2020) "CatBoost for Big Data: An Interdisciplinary Review." https://doi.org/10.21203/rs.3.rs-54646/v1
26 Harode, R (2021) "XGBoost: A Deep Dive into Boosting - SFU Professional Computer Science." Medium https://medium.com/sfu-cspmp/xgboost-a-deep- dive-into-boosting-f06c9c41349
27 "Implementing Artificial Neural Network (Classification) in Python From Scratch." (n.d.) Analytics Vidhya Retrieved June 13, 2024, from https://www.analyticsvidhya.com/blog/2021/10/implementing-artificial-neural- networkclassification-in-python-from-scratch/
28 Jain, S (2024) "Decision Tree in Machine Learning." GeeksforGeeks
Retrieved June 13, 2024, from https://www.geeksforgeeks.org/decision-tree- introduction-example/
29 Leff, G (2016) "Why Airline Stocks Have Fallen - and What Would Make Them Rise." [online] View from the Wing Available at: https://viewfromthewing.com/2016/07/27/a-simple-model-for-why-airline- stocks-have-fallen-despite-record-profits-and-what-could-change-that/
30 Lev, Thigarajan (1993) "Fundamental information analysis." Journal of
31 Lewellen, J (2004) "Predicting returns with financial ratios." Journal of
32 "Nine airlines official websites for investors and information publications."
33 Pearce, B (2013) "Profitability and the air transport value chain." IATA: p.10
34 "Predicting the U.S Airline Operating Profitability using Machine Learning Algorithms." (2019) Retrieved April 2024, from https://pdfs.semanticscholar.org/3df2/df6178a522ed9602cd149eb3ac32fb621fb3 pdf
35 Ramirez, R (n.d.) "Neural Networks in Python: ANN." Circuit Basics
Retrieved June 13, 2024, from https://www.circuitbasics.com/neural-networks- in-python-ann/
36 Reed, T (2016) "Challenges in Aligning Unit Revenue Metrics with Stock Performance in the Airline Sector."
37 Ross, S (1976) "Introduction to Arbitrage Pricing Theory."
38 Samuelson, P A (1965) "Proof that properly anticipated prices fluctuate randomly." Industrial Management Review 6(2): pp 41-49
39 Scotti, D & Volta, N (2017) "Profitability change in the global airline industry." Transportation Research Part E 102: pp 1-12 doi:10.1016/j.tre.2017.03.009
40 Stalnaker, T., Usman, K., Taylor, A & Alport, G (2017) "Airlines economic analysis." Oliver Wyman
41 Thu Trinh, N & Duc Le, T (2017) "Valuation of airlines stocks: profit versus PRASM."
42 Vasigh, B., Fleming, K & Humphreys, B (2014) "Foundation of Airline
Finance: Methodology and Practice." New York: Routledge
43 Wanjawa, B W & Muchemi, L (2014) "ANN Model to Predict Stock Prices at Stock Exchange Markets." ResearchGate https://www.researchgate.net/publication/269405423_ANN_Model_to_Predict_ Stock_Prices_at_Stock_Exchange_Markets
44 Web.mit.edu (2019) "Airline Data Project." [online] Available at: http://web.mit.edu/airlinedata/www/Res_Glossary.html
45 Wizards, D S (2023) "Understanding the AdABoost Algorithm - Data Science Wizards." Medium https://medium.com/@datasciencewizards/understanding- the-adaboost-algorithm-2e9344d83d9b
SOCIALIST REPUBLIC OF VIETNAM Independence – Freedom - Happiness
EXPLANATORY REPORT ON CHANGES/ADDITIONS
BASED ON THE DECISION OF GRADUATION THESIS COMMITTEE
FOR UNDERGRADUATE PROGRAMS WITH DEGREE AWARDED BY
Student’s full name: Vũ Ngô Bảo Châu
Graduation thesis topic: Machine learning models to predict Shareholder
Returns in the airline industry
According to VNU-IS’s decision no …… QĐ/TQT, dated … / … / ……, a Graduation Thesis Committee has been established for Bachelor programs at Vietnam National University, Hanoi The thesis has undergone defense and revisions in the specified sections.
No Change/Addition Suggestions by the Committee Detailed Changes/ Additions Page
The student should carefully check for grammar errors and strictly follow the style guidelines of the school
Checked and corrected grammatical errors and revised full of the graduation project in follow the style guidelines of the school
The student should perform hyper parameter tuning when trying to optimize her models
Change the parameter of ANN and Decision Tree to optimize models and high prediction results
Phụ lục 6 Biên bản giải trình sau bảo vệ