Analyze and forecast sales for several kiosks at noi bai and tan son nhat airports using a quantitative approach
Trang 1VIETNAM NATIONAL UNIVERSITY, HANOI
INTERNATIONAL SCHOOL
GRADUATION PROJECT
ANALYZE AND FORECAST SALES FOR SEVERAL KIOSKS AT NOI BAI
AND TAN SON NHAT AIRPORTS USING A QUANTITATIVE
APPROACH
Ngo Mai Anh
Hanoi - Year 2024
Trang 2VIETNAM NATIONAL UNIVERSITY, HANOI
MAJOR: Business Data Analytics
Trang 3Hanoi - Year 2024
Acknowledgement
I would like to express my deepest gratitude to Ph.D Tran Duc Quynh for his
invaluable guidance and unwavering support throughout my thesis journey His patience, motivation, enthusiasm, and immense knowledge have been essential to
my research and writing process His dedicated mentorship has played a pivotal role
in shaping every phase of my work
I would also like to extend my heartfelt appreciation to the individuals and organizations who have provided insights and resources that have contributed to my thesis on "Analyzing and Forecasting Sales Performance for Airport Retail Stores at Nội Bài and Tân Sơn Nhất Using Quantitative Approaches." Their contributions have been instrumental in enriching the depth and scope of my research, enabling
me to develop a more comprehensive analysis
Finally, I want to express my sincere thanks to my friends and family who have supported me throughout this journey Whether through direct assistance or indirect encouragement, your unwavering support has been a cornerstone of my progress I
am especially grateful to my parents for their unconditional love, sacrifices, and belief in my abilities, which have always inspired me to strive for excellence
Thank you sincerely
Trang 4Letter of Declaration
I hereby declare that the Graduation Project, titled "Analyzing and Forecasting Sales Performance for Airport Retail Stores at Nội Bài and Tân Sơn Nhất Using Quantitative Approaches," is the result of my own research and has not been previously published Throughout the project, I adhered strictly to research ethics All findings and data analysis presented are the outcome of my personal research efforts, and all references are properly cited as per guidelines
I take full responsibility for the accuracy of the numbers, data, and all other contents included in this graduation project This report faithfully represents my work over
the research period Under the guidance of Ph.D Tran Duc Quynh, I diligently
and honestly completed my assigned tasks I confirm that this report upholds all copyrights and contains no fraudulent or misleading information
If there is any instance of academic misconduct identified in this report, I accept full responsibility and am prepared to face the consequences as determined by the department, faculty, and school
Trang 53 Scope and Relevance
4 Significance of the Study
Chapter 2: Literature reviews
1 Overview of sales forecasting methods
2 Application of Visualization in Sales Analysis
3 Factors Affecting Airport Kiosk Sales
Effectiveness Calculation and Adjustment for Seasonality
2 Predictive Model Outcomes
2.1 Model Implementation for Sales Prediction
2.1.1 XGBoost
2.1.3 Weighted Moving Average (WMA)
2.1.4 Exponential Smoothing (ES)
2.1.5 ARIMA
2.2 Forecasting revenue for the next 6 months using the ARIMA model
Chapter 5: Practical Business Strategies & Plan
Trang 6LIST OF TABLES
Table 3.1 Data fields in the data frame of the revenue dataset 15Table 3.2 Data fields in the data frame of airport passenger volume 18Table 4.1 Model Performance Results of Noi Bai 36Table 4.2 Model Performance Results of Tan Son Nhat 36Table 4.3 Forecasting revenue for the next 6 months for Noi Bai Airport 38Table 4.4 Forecasting revenue for the next 6 months of Tan Son Nhat
Table 5.1 Practical marketing & sales strategies 44
LIST OF FIGURES
Figure 4.5 Customer purchasing trends by location 26Figure 4.6 Forecasting revenue for the next 6 months for Noi Bai Airport 37Figure 4.7 Forecasting revenue for the next 6 months of Tan Son Nhat
Trang 7This study employs a quantitative approach to analyze and forecast sales for kiosks
at Nội Bài (Hanoi) and Tân Sơn Nhất (Ho Chi Minh City) airports, using advanced statistical models and predictive analytics The research focuses on examining historical sales data from 2021 to 2024 to identify seasonal trends, purchasing behaviors, and key drivers of revenue performance in the airport retail environment Seasonal peaks, particularly during the Tết holiday and summer travel periods, were identified as major contributors to revenue, emphasizing the impact of high passenger volumes and demand fluctuations on sales
The analysis evaluates multiple forecasting models, including ARIMA, Weighted Moving Average (WMA), and Exponential Smoothing, to determine their suitability for predicting future sales Among these, ARIMA emerged as the most accurate, effectively capturing seasonal patterns and trends Forecasts for the first half of
2025 predict stable revenue with continued seasonal spikes
This research highlights the importance of data-driven strategies to address the unique challenges of airport retail, such as limited customer time and high variability in demand Practical recommendations include dynamic merchandising, inventory adjustments tailored to seasonal demand, and location-specific marketing initiatives These findings contribute to the growing literature on retail analytics and offer actionable insights for kiosk operators to enhance profitability and operational efficiency
By bridging the gap between raw data and actionable insights, this study provides a framework for leveraging predictive analytics to optimize sales performance in high-traffic airport environments, particularly in emerging markets like Vietnam
Trang 8Chapter 1: Introduction
1 Problem statement
In the competitive landscape of airport retail, kiosks face unique challenges Unlike conventional retail spaces, airport kiosks operate within a confined customer pool influenced by travel schedules, spending habits, and airport-specific factors such as flight volume and passenger demographics Globally, the airport retail market was valued at $18.5 billion in 2020 and is projected to reach $40.6 billion by 2027, with
a compound annual growth rate (CAGR) of 12.1% (Grand View Research, 2021) This underscores the growing importance of understanding sales trends and consumer behavior to tap into this lucrative market effectively
Airports are not merely transit hubs but also key retail zones, with a significant portion of revenue derived from non-aeronautical sources such as shopping and dining Studies indicate that non-aeronautical revenue constitutes up to 40-60% of
an airport's total income, making the role of kiosks and other retail outlets critical (ACI, 2020) Effective strategies for airport retailing hinge on a comprehensive understanding of consumer behaviors, shopping preferences, and purchasing patterns, as well as external factors like flight delays and passenger demographics.Despite their critical role, the predictive capabilities of airport kiosks remain under-researched, particularly in emerging markets like Vietnam Airports such as Noi Bai (Hanoi) and Tan Son Nhat (Ho Chi Minh City) serve as key commercial hubs, attracting both domestic and international travelers However, the lack of data-driven strategies tailored to these specific retail environments limits their potential
to maximize revenue and profitability
To address these challenges, this study employs a quantitative approach to analyze historical sales data, identify trends, and develop robust forecasting models By leveraging advanced visualization techniques and predictive analytics, this research aims to bridge the gap between raw data and actionable insights It focuses on enabling kiosk operators to make informed decisions regarding inventory management, marketing campaigns, and pricing strategies, ensuring alignment with customer preferences and behavior
Trang 92 Objectives
The overarching objective of this study is to provide a comprehensive analysis of sales performance and develop accurate forecasting models for airport kiosks The specific objectives are:
1 To analyze historical sales data from kiosks at Nội Bài and Tân Sơn Nhất airports, identifying patterns, trends, and anomalies
2 To visualize key insights derived from the data, enabling stakeholders to understand sales dynamics effectively
3 To develop and evaluate forecasting models using quantitative methods, ensuring high predictive accuracy
4 To offer actionable recommendations for improving operational efficiency and revenue growth
3 Scope and relevance
This research focuses exclusively on kiosks operating within Nội Bài and Tân Sơn Nhất airports, utilizing datasets spanning multiple months to ensure comprehensive analysis The study emphasizes quantitative methods, combining descriptive analytics with advanced forecasting techniques While external factors such as marketing campaigns and broader economic trends are acknowledged, the analysis prioritizes internal sales data for clarity and specificity
The findings of this research are expected to contribute significantly to the literature
on airport retail analytics They will also provide practical insights for kiosk operators, airport authorities, and other stakeholders in enhancing their retail strategies Moreover, the emphasis on visualization ensures that complex data is accessible to non-technical audiences, bridging the gap between analytics and practical implementation
4 Significance of the study
Airport retail has grown into a multi-billion-dollar industry, with global sales expected to reach $40 billion by 2030, driven by the rising number of air travelers and expanding airport infrastructures (Statista, 2023) In Vietnam, Nội Bài and Tân Sơn Nhất airports serve as key gateways, collectively handling over 70 million passengers annually (Vietnam Airports Corporation, 2023) However, the
Trang 10unpredictable nature of customer flow and spending behavior in these settings poses significant challenges for kiosk operators.
Through advanced data analysis and forecasting, this research aims to:
- Enhance understanding of sales dynamics in high-traffic airport
Trang 11Chapter 2: Literature reviews
1 Overview of sales forecasting methods
Sales forecasting plays a pivotal role in strategic planning and operational efficiency, providing businesses with data-driven insights to anticipate future trends and allocate resources effectively The methodologies for sales forecasting have evolved significantly, encompassing traditional statistical approaches and advanced machine learning techniques
1.1 Time series models
Time series models, such as Moving Average (MA), Weighted Moving Average (WMA), Exponential Smoothing, and ARIMA (Autoregressive Integrated Moving Average), have long been foundational in forecasting These methods are particularly suited for identifying and modeling patterns over time, such as seasonality and trends (Box & Jenkins, 1976) For instance, ARIMA models are frequently employed in retail forecasting due to their ability to capture autoregressive and moving average components, making them effective for short-term predictions (Hyndman & Athanasopoulos, 2021)
1.2 Comparative effectiveness
While traditional models excel in simplicity and interpretability, ML methods offer superior flexibility and accuracy in complex scenarios Combining these approaches, such as integrating ARIMA with ML algorithms, has been proposed as
a hybrid methodology to leverage the strengths of both techniques (Zhang, 2003) For instance, hybrid ARIMA-ANN models have been successfully used in retail contexts to predict seasonal demand more effectively than either approach alone
2 Application of visualization in sales analysis
Visualization is an indispensable tool in sales analysis, transforming raw data into interpretable insights that support decision-making The use of charts, graphs, and interactive dashboards allows stakeholders to identify trends, correlations, and outliers with ease
2.1 Role of visualization in data interpretation
Effective visualization bridges the gap between complex datasets and actionable insights Techniques such as time-series plots, heatmaps, and bar charts enable the
Trang 12identification of seasonality, peak sales periods, and product performance trends (Few, 2006) Advanced visualization tools, such as Tableau and Power BI, have further revolutionized data analysis by allowing dynamic exploration and real-time updates (Heer et al., 2010).
2.2 Challenges in visualization
While visualization is powerful, it requires careful design to avoid misrepresentation Poorly constructed graphs or excessive detail can lead to misinterpretation, undermining decision-making Thus, selecting the appropriate visualization techniques for specific datasets and audiences is crucial (Munzner, 2014)
3 Factors affecting airport Kiosk sales
Airport retail environments are unique, shaped by a combination of external and internal factors that influence consumer behavior and sales performance These factors can be broadly categorized into customer demographics, product offerings, and environmental characteristics
3.1 Customer demographics
Airports attract a diverse mix of travelers, including business professionals, families, and tourists Each group exhibits distinct spending behaviors, with business travelers often prioritizing convenience and premium products, while tourists may focus on souvenirs and local specialties (Freathy & O’Connell, 2012) Demographic factors such as nationality, age, and income levels significantly affect purchasing decisions
3.2 Product offerings
The product mix at airport kiosks also plays a critical role in driving sales Studies suggest that high-margin items, such as snacks, electronics, and duty-free goods, are particularly popular in airport settings (Torres et al., 2015) Seasonal promotions and limited-time offers further boost sales, creating a sense of urgency among travelers with limited time
3.4 Environmental characteristics
Operational hours, foot traffic, and kiosk placement within the airport are critical determinants of sales performance Locations near boarding gates, lounges, or high-traffic areas tend to generate higher sales (Gillen & Morrison, 2015) Additionally,
Trang 13external factors such as flight delays, economic conditions, and local events can influence spending patterns.
3.4 Unique challenges of airport kiosks
Unlike traditional retail spaces, airport kiosks must adapt to a captive audience with constrained time and space This creates operational challenges, including inventory management and pricing strategies tailored to fluctuating passenger volumes (Graham, 2013) As a result, data-driven approaches that consider these dynamics are essential for optimizing sales
Trang 14
Chapter 3:Methodology
1 Data Description
1.1 Overview of the data
1.1.1 Revenue data
● Structure and Volume
- Each monthly revenue file contains transactional records from sales operations at specific locations (e.g., Noi Bai, Tan Son Nhat)
- The average number of rows per file is approximately 1,200 rows,
representing individual product sales and their corresponding metrics such as quantity sold (SlgBan), revenue in USD (Total_USD), and VND (Total_VND)
- For each location, there are a total of 54 monthly files spanning a 4 - year period (2021–2024) This results in approximately 153,600 rows of data per location before filtering and cleaning.
● Key Attributes
Table 3.1 Data fields in the data frame of the revenue dataset
1.1.2 Passenger volume data
● Structure and volume
Trang 15- Passenger data is stored in large CSV files, with each record representing a single flight.
- The file for Noi Bai airport contains 156,090 rows, representing flight-level
data across multiple months
- The file for Tan Son Nhat airport contains 238,626 rows, representing
flight-level data across multiple months
- Each row includes details such as the flight date, route, passenger type (domestic or international), and total passenger count (TotalPax)
1.2 Data processing
The data processing phase played a crucial role in ensuring that the dataset was structured, clean, and suitable for further analysis This process involved multiple steps, including consolidating revenue data, passenger data, and invoice records, as well as integrating these datasets into a single, comprehensive format Each step required careful handling of missing values, formatting inconsistencies, and data alignment issues to ensure accuracy and reliability
The revenue data was initially stored in multiple Excel files, with each file containing transactional records corresponding to a single month To facilitate a unified analysis, it was necessary to consolidate all these files into a single dataset This was accomplished by systematically reading each file using the Pandas library
in Python, ensuring that only relevant columns were extracted while maintaining data consistency across all files Additionally, data types were explicitly converted
to standard formats to ensure numerical operations could be performed seamlessly
A new column indicating the month associated with each transaction was then appended to the dataset, allowing for precise time-series analysis
Once the individual datasets were prepared, they were merged into a single DataFrame This was achieved by appending each monthly dataset to a list and subsequently concatenating them using the Pandas library Following this, a rigorous data validation process was conducted, during which rows with missing or zero revenue values were removed to prevent inaccuracies in later analyses Additionally, a consistency check was performed to detect and rectify any extreme outliers that could distort the findings
After consolidating the revenue data, the next step involved processing the passenger volume data, which was originally stored in CSV format This dataset contained flight-level records detailing the number of passengers traveling through
Trang 16Noi Bai and Tan Son Nhat airports The first step in processing this dataset involved reading the CSV files and converting the flight date column into a standardized datetime format This conversion enabled efficient time-based operations, such as aggregating data at a monthly level Once the date format was standardized, the dataset was grouped by month to ensure consistency with the revenue dataset The total passenger count for each month was computed separately for domestic and international passengers, allowing for a more granular analysis of passenger trends.The processing of invoice data followed a similar methodology Invoice records were read from Excel files and underwent standardization to ensure consistency across all datasets The invoice date column was transformed into datetime format, and a new column indicating the corresponding month was created to facilitate data merging Subsequently, the total number of invoices per month was calculated to provide an indicator of transaction volume trends.
In addition to consolidating revenue, passenger, and invoice data, an essential aspect of the analysis was identifying the top 20 best-selling products based on revenue This step involved grouping transaction data by product code and name, followed by computing the total revenue generated by each product To ensure accuracy, the revenue column was converted to a numeric format, and any non-numeric or invalid values were coerced to null values and removed The top 20 products were then extracted using the Pandas nlargest function, which enabled a precise ranking based on total revenue contributions
Finally, the integration of all datasets was performed to create a single, comprehensive dataset encompassing all relevant variables To achieve this, the time frames of all datasets were aligned to ensure consistency, followed by merging them based on the common month column The resulting dataset included key variables such as total revenue, passenger volume, and invoice count, allowing for a holistic analysis of sales trends A final validation check was conducted to detect any inconsistencies or missing values, ensuring the dataset was ready for visualization and predictive modeling
This refined dataset served as the foundation for subsequent analyses, enabling the extraction of meaningful insights and facilitating the development of robust forecasting models The meticulous data processing approach ensured that the dataset was both reliable and suitable for supporting data-driven decision-making in the context of airport retail sales forecasting
Trang 171.2.5 Integrating Data
The final dataset includes the following columns:
Bill_Count Total number of invoices for each month
Table 3.2 Data fields in the data frame of airport passenger volume
Data integration was performed using the pd.merge() function to join datasets based
on the Month column
2 Analysis framework
The analysis framework was designed to extract actionable insights from the data through advanced visualization techniques and statistical analysis All visualizations were created using Tableau and Python, a powerful data visualization tool that supports interactive and dynamic dashboards Exploratory Data Analysis (EDA)
2.1 Exploratory Data Analysis (EDA)
EDA was conducted to uncover key trends, patterns, and relationships within the dataset The steps included:
- Descriptive Statistics: Metrics such as mean, median, standard deviation,
and range were calculated for revenue and passenger traffic These statistics offered a foundational understanding of the data distribution and highlighted significant variations across time periods
- Correlation Analysis: Relationships between variables, such as passenger
traffic and revenue, were quantified using Pearson’s correlation coefficient This analysis revealed a strong positive correlation, emphasizing the critical role of customer volume in driving sales
- Seasonality and Trend Analysis: Using Python, time series decomposition
was performed to separate the revenue data into trend, seasonal, and residual
Trang 18components This analysis provided insights into recurring patterns (e.g., Tet and summer peaks) and anomalies in the data.
2.2 Insights from visualizations
Using Python for visualizations provided critical insights into sales trends:
Peak Sales Periods: Significant spikes occurred during the Tết holidays
(January–February) and summer months (June–August), driven by increased
passenger traffic Time-series plots from matplotlib highlighted these seasonal trends effectively
Location-specific trends: Scatterplots showed:
- Noi Bai: Stronger sales of cultural products.
- Tan Son Nhat: A broader demand for premium items.
These trends were linked to passenger density and spending behavior
2.3 Visualization tools and outputs
Python Visualizations:
Supplemented Tableau with tailored visualizations, including:
- Heatmaps for correlation analysis
- Time series plots for decomposed revenue components (trend, seasonal, residual)
- Dual-axis plots to compare promotional spending and sales performance
3 Predictive modeling
Trang 19To achieve the objectives of this study, various predictive modeling techniques were employed, combining statistical and smoothing-based approaches to forecast kiosk sales The choice of models was guided by the characteristics of the data and the specific forecasting needs.
Moving Average (MA)
The Moving Average model is one of the simplest time series forecasting techniques, widely used to smooth fluctuations in historical data and identify underlying trends (Makridakis et al., 1998) This method calculates the average of data points over a fixed window, sliding across the series In this study, a 12-month moving average was applied to smooth sales data and generate short-term forecasts Its simplicity and interpretability made it a valuable baseline for comparison
Weighted Moving Average (WMA)
Weighted Moving Average builds on the basic MA by assigning more weight to recent observations, making it more responsive to recent changes in the data (Hyndman & Athanasopoulos, 2018) Linear weights were applied in this study to emphasize more recent months in forecasting This technique offered improved accuracy over MA, particularly in datasets with evolving sales patterns, while retaining the simplicity of moving averages
ARIMA (Autoregressive Integrated Moving Average)
ARIMA is a statistical model that accounts for trends, seasonality, and autocorrelation in time series data (Box & Jenkins, 1976) It integrates autoregression (AR), differencing (I), and moving average (MA) components to model the underlying structure of the data For this study, ARIMA was applied to monthly sales data, with differencing used to achieve stationarity The model parameters (p, d, q) were optimized using the Akaike Information Criterion (AIC)
Trang 20ARIMA was particularly effective for generating forecasts in datasets with strong seasonal and trend characteristics, making it a valuable tool for operational planning and inventory management.
XGBoost (Extreme Gradient Boosting)
XGBoost is a powerful machine learning algorithm that builds an ensemble of decision trees to capture complex relationships in structured data (Chen & Guestrin, 2016) It uses gradient boosting, where each tree corrects the errors of the previous ones, enabling accurate predictions For this study, XGBoost was applied to monthly sales data, leveraging historical sales and seasonal indicators Hyperparameters, such as learning rate and tree depth, were optimized using grid search Although XGBoost is effective at modeling non-linear patterns, its performance in this study was limited due to the absence of diverse explanatory variables, making simpler models more reliable for forecasting in this context
Model Selection and Evaluation
The models were evaluated using metrics such as Root Mean Square Error (RMSE) and Mean Absolute Percentage Error (MAPE), which measure forecast accuracy in terms of absolute and relative errors, respectively Additionally, R² (coefficient of determination) was used to assess the proportion of variance in revenue explained
by each model The ARIMA and Holt-Winters methods demonstrated strong performance in forecasting time series data with clear trends and seasonality, While XGBoost and Random Forest provided superior accuracy in capturing complex relationships and interactions within the dataset The complementary strengths of these models ensured reliable forecasts and actionable insights for optimizing kiosk operations
Trang 21Chapter 4: Result
1 Visualizations
1.1 Monthly revenue trends (2021–2024)
Figure 4.1 Monthly revenue trends (2021–2024)
- NB, though steady, exhibits smaller fluctuations and a comparatively slower growth trajectory
Impact:
Trang 22- TSN's steep revenue spikes might be attributed to Hồ Chí Minh City being the largest market and known as the economic hub of Vietnam, where shopping and travel demand is significantly higher compared to Hanoi.
Trang 231.3 Revenue per Pax vs Revenue per Bill
Figure 4.3 Revenue per Pax vs Revenue per Bill
Trang 24Figure 4.4 Revenue distribution
Trang 25Figure 4.5 Customer purchasing trends by location
1.5.1 Common Trends
Types of products: