1. Trang chủ
  2. » Luận Văn - Báo Cáo

Data warehouse and stock analysis for bank in vietnam

70 2 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Data Warehouse and Stock Analysis for Bank in Vietnam
Tác giả Dao Xuan Huong
Người hướng dẫn Doctor. Nguyen Quang Thuan
Trường học Vietnam National University, Hanoi International School
Chuyên ngành Management Information Systems
Thể loại Graduation project
Năm xuất bản 2025
Thành phố Hanoi
Định dạng
Số trang 70
Dung lượng 1,42 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Cấu trúc

  • CHAPTER 1: INTRODUCTION (9)
    • I. Rationale (9)
      • 1. The Importance of the Research Topic (9)
      • 2. Practical Urgency of the Topic (9)
      • 3. Scientific and Practical Significance (10)
      • 4. Personal Motivation (10)
    • II. Research Objectives (11)
    • III. Research questions (14)
      • 1. How to build a Data Warehouse (DW) architecture suitable for the banking (14)
      • 2. How to integrate data from multiple sources in the financial sector? (14)
    • IV. Object and Scope of the Study (14)
    • V. Research methods (15)
      • 1. Qualitative Methods (15)
      • 2. Quantitative Methods (15)
      • 3. Data Collection and Processing (15)
    • VI. Structures (16)
  • CHAPTER 2: LITERATURE REVIEW (18)
    • I. Theoretical Background (18)
      • 1. Definition of Data Warehouse (18)
      • 2. Key Concepts Related to Data Warehousing (18)
      • 3. Importance of Data Warehouses in Financial Sectors (20)
    • II. Literature Review (20)
      • 1. Previous Research on Data Warehouse Implementation (20)
      • 2. Survey of Past and Current Research on Data Warehousing in (21)
      • 3. Solutions and Best Practices in Financial Data Warehousing (23)
      • 4. Gaps in Existing Literature (23)
    • III. Research Gap (23)
  • CHAPTER 3: METHODOLOGY (25)
    • I. Research Analysis Methods (25)
      • 1. Data Collection and Integration (Ingestion) (25)
      • 2. ETL (Extract, Transform, Load) (26)
      • 3. Data Modeling and Cube Building (Build Cube) (26)
      • 4. Data Query and Analysis (26)
      • 5. System Performance Evaluation (26)
      • 6. Supporting Tools (27)
    • II. Data Collection Methods (27)
    • III. Analysis Methods (28)
      • 1. Descriptive Analysis (28)
      • 2. Diagnostic Analysis (28)
  • CHAPTER 4: FINDINGS (29)
    • I. Data Description (29)
      • 1. OLTP Diagram (29)
      • 2. OLAP Diagram (31)
    • II. Analysis and Discussion (35)
      • 1. Building OLAP (35)
      • 2. Designing Apache Airflow Workflows (41)
      • 3. ETL (Extract, Transform, Load) (43)
      • 4. Developing OLAP Cubes (46)
    • III. Evaluations (58)
      • 1. Key Findings (58)
      • 2. Causes and Impacts (60)
      • 3. Overall Evaluation (60)
    • I. Conclusion (64)
    • II. Limitations of the Study (64)
    • III. Proposals and Recommendations (65)

Nội dung

Data warehouse and stock analysis for bank in vietnam Data warehouse and stock analysis for bank in vietnam

INTRODUCTION

Rationale

1 The Importance of the Research Topic

In Vietnam's rapidly evolving digital landscape, the banking and securities sectors are experiencing significant growth, processing millions of financial transactions daily This surge in activity generates vast amounts of data that demand extensive storage solutions, alongside efficient management and utilization strategies.

However, financial institutions and securities companies in Vietnam face significant challenges:

• Fragmented data: Data is stored across various systems such as transaction systems, CRM, and accounting, leading to difficulties in integration and information retrieval

• Complex data analysis: Traditional tools are insufficient for analyzing large and diverse datasets, reducing the efficiency of strategic decision-making

• Fierce competition: Financial institutions need to optimize their data utilization to enhance operational efficiency, reduce costs, and improve service quality to stay competitive

Therefore, developing a Data Warehouse that is integrated and modern has become an urgent requirement It can address these challenges while leveraging data as a strategic asset

2 Practical Urgency of the Topic

Many banks and securities companies in Vietnam are still utilizing disparate storage and processing systems that lack integration and scalability This results in:

Low efficiency in reporting and analysis: Reporting processes are often time- consuming, inaccurate, and untimely

Difficulties in forecasting and decision-making: The absence of centralized data and effective analysis reduces the ability to forecast market trends and make strategic decisions

Furthermore, the development of modern technologies such as Big Data, AI, and Machine Learning presents significant opportunities for extracting value from data

A Data Warehouse is not just a storage solution but also a platform for organizations to leverage these technologies, optimizing operations and enhancing competitiveness

• This study contributes to developing technical solutions for building Data Warehouses, particularly in the financial and banking sectors, where precision, security, and performance are critical

• It provides a reference model and methodology for building Data Warehouses in the financial sector, from architectural design to practical implementation

• Helps banks and securities companies in Vietnam integrate data from multiple sources, creating a centralized and reliable data platform

• Supports managers and leaders in making quick and accurate decisions based on comprehensive data analysis and reports

• Enhance operational efficiency by reducing data processing time and improving the accuracy of information, thereby improving customer service quality

With a strong passion for data and technology, especially within the financial and banking industries, I believe that building a Data Warehouse is a compelling and valuable endeavor It plays a crucial role in enhancing business decision-making, improving data analytics, and driving digital transformation, ultimately delivering significant benefits to businesses and society alike.

This research is an opportunity for me to apply the knowledge I have learned to practical scenarios while exploring and learning more about modern technologies in data management and utilization.

Research Objectives

This study aims to develop a comprehensive Data Warehouse system that integrates data from diverse sources, enabling efficient data storage, management, and analysis Designed to support banks and securities organizations in Vietnam, this system enhances data organization while optimizing analysis and decision-making processes It addresses key requirements such as accuracy, security, and scalability, providing a reliable platform for financial institutions to leverage data-driven insights effectively.

Design a Data Warehouse architecture suitable for the characteristics of financial and banking data in Vietnam:

• Analyze the business requirements and data characteristics of banks and securities organizations, including transaction data, customer information, assets, and financial reports

The architecture of a Data Warehouse system is designed to optimize data management and query performance through a well-structured framework Central to this architecture is the Star Schema model, which streamlines data retrieval and enhances reporting efficiency The system is organized into distinct data layers, including the staging layer for temporary data storage, the integration layer for consolidating data from various sources, and the presentation layer for delivering insights to end-users To implement an effective Data Warehouse, technologies like PostgreSQL are highly recommended due to their robustness, scalability, and support for complex queries, ensuring a reliable and efficient data management solution.

• Ensure the system is scalable to handle large volumes of data in the future

• Develop a security mechanism to ensure data safety and integrity, in compliance with financial sector regulations in Vietnam

Integrate and process data from existing securities and banking systems:

• Survey and assess input data sources: o Securities transaction data from stock exchanges (HSX, HNX)

Financial data from banking systems—including credit management, customer account management, and interbank transaction platforms—provide essential insights for financial analysis Additionally, unstructured data such as emails, customer notes, and reports from customer service departments offer valuable qualitative information that enhances understanding of customer interactions and service quality Combining structured and unstructured data improves decision-making, fraud detection, and customer relationship management within the banking sector.

Building the ETL (Extract, Transform, Load) Process

Objective: Collect data from various sources, including:

• Relational Databases (RDBMS): PostgreSQL, etc

• APIs: Systems for securities and banking transactions that provide data via APIs

• File Data: CSV, Excel files, or log file formats exported from systems

• Use data extraction library like Vnstock3, or custom scripts to extract data from sources

• Extract raw data without applying any processing to preserve the original state of the data

• Store this raw data in a staging area, typically in Relational Database system

(PostgreSQL) or internal file systems (excel, csv)

Objective: Perform data cleaning, transformation, and integration directly within the Data Warehouse, leveraging the performance of modern data processing tools Implementation:

• Remove Duplicates: o Identify and eliminate duplicate records in the data o Use SQL queries or tools like dbt to handle this processing

• Standardize Data Formats: o Convert date and time formats into a unified standard

13 o Encode data according to industry standards (e.g., financial sector codes, ISO standards)

To ensure data quality, it's essential to handle logical errors and missing data effectively Apply contextual rules to correct errors and fill in gaps, such as assigning default values to null fields or removing invalid records This approach maintains data integrity and improves the accuracy of your analytics.

• Integrate Data: o Perform joins between multiple data sources to create analysis tables o Design and build dimensional analysis tables (e.g., Star Schema or Snowflake Schema)

Objective: Transfer raw data from the staging area to the Data Warehouse without performing any transformations

• Use loading tools such as Apache Airflow

• Load the data into raw tables (often referred to as staging tables) in the Data Warehouse

• Ensure data loading is performed sequentially or in batches to reduce the load on source systems

Develop reports and dashboards for data analysis and decision-making support:

Develop a comprehensive reporting system by designing detailed reports that support business management, including daily transaction reports covering transaction volume, transaction value, and success rate, as well as summary reports that provide leaders with an overall view of business activities for better decision-making.

• Build interactive dashboards: Use data visualization tool (Power BI) to create interactive dashboards that monitor:

14 o Real-time business performance o Securities trading trends and market conditions o Key performance indicators (KPIs) such as liquidity ratios, risk levels, and investment efficiency

Leveraging data from the Data Warehouse enables the development of advanced forecasting models using Machine Learning, which accurately predict market trends and customer behavior These insights support informed decision-making and strategic planning Additionally, data analysis facilitates tailored business solutions, such as optimizing investment portfolios, refining credit strategies, and allocating resources more efficiently, ultimately driving improved operational performance.

Research questions

1 How to build a Data Warehouse (DW) architecture suitable for the banking and securities sectors?

• Which architecture models (e.g., Star Schema, Snowflake Schema) are suitable for integrating diverse financial data?

• How should the system be designed to handle large volumes, complex data, and real-time transactions?

2 How to integrate data from multiple sources in the financial sector?

• How can data from various sources (relational databases, APIs, data files) be unified into a common Data Warehouse?

• What challenges arise when integrating structured, semi-structured, and unstructured data?

• Which data integration methods (ETL, ELT) and tools (Apache NiFi, Talend, dbt) are the most effective?

• How to ensure data quality (eliminating duplicates, handling missing data, standardizing formats)?

Object and Scope of the Study

This research examines financial data systems, including securities and banking transaction systems, focusing on data collection, processing, and storage methods It explores modern Data Warehouse architectures and data integration technologies to address the evolving needs of digital finance Targeting commercial banks and securities firms in Vietnam, the study analyzes their transaction data systems and integration challenges encountered from 2020 to the present The rapid growth of digital finance underscores the increasing demand for advanced Data Warehouse solutions to support efficient data management and decision-making.

Research methods

• Collect and analyze documents and reports related to the construction and application of Data Warehouses in the financial sector

• Sources include research reports from financial institutions, scientific papers, textbooks, and articles from international academic journals

Data Collection from Real Systems:

• Collect data from actual transaction systems of commercial banks and securities companies in Vietnam

• Data types include transaction data, customer information, financial reports, and supporting datasets

• Utilize tools and platforms such as SQL Server, Oracle, or Snowflake to build and test the Data Warehouse system

• Perform quantitative analyses to evaluate the system's effectiveness based on: o Data processing performance o Accuracy of integrated data o System availability and flexibility

Data is collected from diverse sources to ensure comprehensive insights, including bank transaction systems that provide credit, account management, and financial transaction data; stock exchanges offering detailed stock trading information such as stock prices and trading volumes; and official documents like reports from financial organizations These varied data sources enable accurate analysis and better decision-making in financial markets.

• Apply the ETL (Extract, Transform, Load) process to extract, process, and load data: o Extract: Collect data from sources using custom scripts o Transform: Clean and standardized data, including:

▪ Standardizing date-time formats and industry codes

▪ Handling missing or logically inconsistent data o Load: Transfer raw data into the data warehouse (staging area) Supporting Tools:

• Use data integration tools like Apache Airflow or dbt to automate and optimize the data processing workflow

• Analyze data using programming languages such as Python, combined with specialized libraries like Pandas and NumPy.

Structures

Chapter 1: Overview of the Research Topic

• General introduction to Data Warehouses and their applications in the financial sector

• Problem statement and rationale of the study

Chapter 2: Theoretical Background and Literature Review

• Key concepts of Data Warehouse, ETL, OLAP, and Big Data

• Overview of research and solutions for implementing Data Warehouses in the banking and securities sectors

Chapter 3: Design and Development of the Data Warehouse

• Architectural model of the system

• Data integration process from various sources

• Tools and technologies used in the implementation

Chapter 4: Application and Performance Evaluation

• Experimental implementation using real-world data

• Analysis of system performance in supporting decision-making

• Proposals for improvements and future development directions

LITERATURE REVIEW

Theoretical Background

A Data Warehouse is a centralized repository that consolidates data from various sources, enabling businesses to efficiently store, manage, and analyze large volumes of data Unlike traditional databases optimized for transactional processing, Data Warehouses are designed to support analytical tasks such as reporting, trend analysis, and strategic decision-making They are subject-oriented, focusing on specific business areas like finance or sales, and are characterized by their integrated, time-variant, and non-volatile nature, providing a unified view of historical data while maintaining data stability over time.

2 Key Concepts Related to Data Warehousing

The ETL (Extract, Transform, Load) process involves extracting data from multiple sources, transforming it into a consistent format, and loading it into a data warehouse Conversely, the ELT (Extract, Load, Transform) approach loads raw data directly into the warehouse before performing transformations within the system ELT is gaining popularity in modern financial applications because it leverages the high computational power of contemporary data warehouses for efficient data processing.

OLAP is a core technology in Data Warehouses that supports multidimensional analysis, allowing users to view data from different perspectives (e.g., time,

19 geography, or product categories) Financial institutions utilize OLAP tools to analyze key metrics like transaction volumes, market trends, or customer behavior, which facilitate faster and more accurate decision-making

Star Schema in Data Warehousing:

The star schema is a popular database design in data warehousing, known for its simplicity and efficiency in supporting analytical queries It features a central fact table that holds quantitative data like sales revenue and transaction volumes, connected to multiple dimension tables that provide descriptive context such as time, location, product, and customer details These dimension tables are denormalized to enable rapid data retrieval and typically contain fewer rows than the fact table, optimizing performance for complex data analysis.

The star schema offers the key advantage of simplifying query writing, enabling analysts to efficiently extract valuable insights Its denormalized structure reduces the need for complex joins, thus optimizing query performance Additionally, its intuitive star-like visual design enhances understanding and implementation, making it a popular choice for data warehousing and business intelligence solutions.

In the financial sector, the star schema is essential for analyzing key performance metrics like stock performance, portfolio returns, and transaction histories This database structure utilizes a central fact table that stores transaction amounts and timestamps, complemented by dimension tables containing contextual data on time, stock information, customer profiles, and geographical regions Implementing a star schema enables financial analysts to efficiently identify trends and patterns across various dimensions, enhancing data-driven decision-making and financial analysis.

Slowly Changing Dimensions (SCD) is a key data warehousing concept that manages updates to dimensional data over time Unlike fact data, which is updated frequently, dimensional data such as customer addresses or product classifications change slowly Implementing SCD ensures these changes are accurately tracked, maintaining historical data integrity for more precise and insightful analysis.

Slowly Changing Dimension (SCD) can be categorized into three main types Type 1, also known as the overwrite method, updates existing data with new values while discarding historical information, making it simple to implement but unsuitable for tracking changes over time Type 2 involves adding a new row for each change, accompanied by a timestamp or flag to identify the current active record, enabling comprehensive historical data analysis.

This method offers a comprehensive history of changes, allowing for detailed time-based analysis, though it requires additional storage and introduces complexity In contrast, Type 3 adds a new column to store the previous value, capturing limited historical data in a straightforward and efficient way However, Type 3's approach only supports tracking a single change, making it suitable for scenarios with minimal historical tracking needs Selecting the appropriate method depends on the balance between complexity, storage requirements, and the level of historical data needed for your analysis.

In financial data warehousing, Slowly Changing Dimensions (SCD) are essential for accurately tracking updates to customer profiles, product classifications, and account details Implementing Type 2 SCD allows organizations to monitor changes such as customer address updates or modifications in a bank’s branch hierarchy, preserving historical data for precise trend analysis Combining SCD with a star schema enhances data consistency, improves analytical efficiency, and provides valuable insights into both historical and current organizational performance These practices are vital for financial institutions seeking reliable, long-term data management and robust analytics.

Big Data in Financial Analysis:

The rise of Big Data has revolutionized data warehousing by enabling the processing of unstructured data like social media feeds and emails alongside traditional structured data Integrating advanced technologies such as Hadoop and Spark allows data warehouses to handle large datasets more efficiently This integration results in richer insights into market dynamics and customer needs, enhancing decision-making and business intelligence Leveraging Big Data in data warehousing is essential for organizations seeking comprehensive, real-time analytics in today's data-driven landscape.

3 Importance of Data Warehouses in Financial Sectors

Data Warehouses are essential in the financial sector for meeting regulatory requirements like AML compliance and risk management reporting They enable banks and securities firms to make better decisions through integrated analytics, providing real-time insights into market trends, portfolio performance, and customer preferences Additionally, Data Warehouses improve operational efficiency by streamlining data management processes and reducing dependence on fragmented systems.

Literature Review

1 Previous Research on Data Warehouse Implementation

Research on data warehouse implementation for financial institutions emphasizes the critical importance of aligning system design with business objectives For example, Smith and Johnson (2020) highlight how dimensional modeling significantly boosts query performance in banking systems Additionally, a case study by Patel et al underscores the value of tailored data warehouse strategies in improving data analytics and decision-making processes within financial organizations.

In 2019, Al et al showcased the successful deployment of a cloud-based Data Warehouse at a multinational bank, leading to substantial cost savings and enhanced data accessibility However, the implementation also highlighted ongoing challenges such as data quality issues, integration complexities, and scalability concerns, especially in rapidly expanding markets.

2 Survey of Past and Current Research on Data Warehousing in Vietnam

Vietnam is increasingly focusing on Data Warehousing, especially within the banking and securities industries While recent research shows progress, notable challenges persist in scaling solutions, implementing real-time analytics, and integrating diverse data sources Addressing these gaps is crucial for advancing data management and analytics capabilities in Vietnam's financial sector.

Research on Data Warehousing (DW) in Vietnam has seen steady development, particularly within the banking and financial sectors One key study by Nguyen et al

In 2018, research explored the adoption of ETL (Extract, Transform, Load) pipelines in Vietnamese banks, highlighting progress in modernizing data management systems However, significant challenges persist, especially in integrating legacy systems with advanced data warehouse (DW) technologies Overcoming these integration issues remains crucial for financial institutions to fully leverage the benefits of modern data warehousing solutions.

Tran & Pham (2020) highlighted the vital role of Big Data in Vietnam's stock exchanges, emphasizing the urgent need for real-time data processing solutions to manage the growing volume and complexity of transactions Their research pointed out that existing data warehousing frameworks are inadequate for handling dynamic data streams, which hampers efficient market analysis and informed decision-making.

Hoang (2022) highlighted the regulatory challenges faced by Vietnam in implementing digital workflows (DW), especially within the financial sector The study emphasized that aligning Vietnam’s domestic financial regulations with international DW standards presents significant obstacles, leading to compliance issues that hinder organizations from fully adopting modern digital workflow practices.

Current Research Trends in Vietnam

Recent advancements in Data Warehousing research in Vietnam highlight its vital role in supporting the country's expanding digital economy, particularly within digital banking and fintech sectors Emphasis is placed on how Data Warehouses enable scalable and efficient data management to accommodate the surge in online customer transactions With the rapid growth of digital banking, robust DW systems are essential for processing large volumes of transactional data in real time, ensuring seamless financial operations and improved customer experience.

Research shows that Data Warehousing (DW) plays a crucial role in integrating fintech solutions, enabling advanced technologies like AI-driven credit scoring and fraud detection As Vietnam's fintech industry continues to innovate, effective data management and analysis have become vital differentiators for startups and traditional financial institutions alike Leveraging DW allows financial firms to process vast amounts of financial data efficiently, supporting smarter decision-making and enhanced security measures in fintech operations.

Many Vietnamese financial institutions are increasingly adopting cloud-based data warehousing solutions like Snowflake and AWS Redshift, moving away from traditional on-premises systems These cloud platforms enable organizations to scale their data infrastructure more efficiently and cost-effectively, minimizing the expenses of maintaining physical data centers The flexibility and cost savings offered by cloud-based data warehouses are crucial for financial institutions aiming to modernize their data management in response to rising digitalization.

Research Gaps and Future Directions

Despite recent advances, significant gaps remain in current data warehouse research, particularly in integrating real-time analytics for banking and stock exchange systems in Vietnam Although progress has been made, many data warehouse solutions still lack robust real-time data analysis capabilities vital for timely decision-making in fast-paced financial markets.

The role of Artificial Intelligence (AI) and Machine Learning (ML) in optimizing Data Warehouse (DW) query performance is an emerging area of research AI and ML offer significant potential to improve the efficiency of DW systems by enhancing data processing and analysis capabilities Exploring how these advanced technologies can be integrated into data warehouses can lead to smarter, faster, and more efficient data management solutions Incorporating AI and ML into DW optimization strategies is essential for future-proofing data storage and retrieval systems.

DW systems is crucial for optimizing their performance and ensuring their scalability

The impact of cloud migration on the security of financial data in Vietnam requires further investigation to address emerging risks While cloud solutions provide benefits like scalability and cost savings, they also introduce challenges related to data security and regulatory compliance Research in this area can help financial institutions mitigate the risks of data breaches, loss of control over sensitive information, and ensure adherence to Vietnam’s evolving regulatory landscape, ultimately enhancing data protection in the financial sector.

Vietnam's ongoing investment in Data Warehousing highlights its rapid digital transformation, especially within the banking and fintech sectors To maximize benefits, there is a crucial need for comprehensive, context-specific research that addresses the unique challenges faced by Vietnamese institutions, including regulatory constraints, infrastructure limitations, and the integration of emerging technologies like AI and cloud solutions This strategic focus will support Vietnam’s continued growth in digital innovation.

3 Solutions and Best Practices in Financial Data Warehousing

Best practices for data warehousing in finance focus on scalability, security, and real-time processing to meet industry demands Cloud-based solutions like AWS Redshift and Snowflake provide flexible, cost-efficient options for financial institutions Techniques such as dimensional modeling with Star Schema and the use of data marts enhance analytical performance and data organization Successful implementations emphasize robust ETL/ELT pipelines and incorporate advanced analytics tools to improve predictive modeling and risk assessment capabilities.

Existing studies offer valuable insights into the technical aspects of Data Warehousing but often overlook the unique challenges faced by emerging markets like Vietnam Critical issues such as limited infrastructure, diverse data formats, and regulatory constraints are rarely addressed comprehensively Additionally, many research efforts neglect the integration of unstructured data sources, like emails and customer feedback, into traditional Data Warehouse systems There is also a lack of focus on balancing real-time analytics with batch processing to meet the dynamic needs of financial institutions in these regions.

Research Gap

The Need for Contextual Solutions

Most existing research on Data Warehousing primarily focuses on developed economies with advanced technological infrastructure, leading to a knowledge gap in adapting these solutions to emerging markets like Vietnam The Vietnamese financial sector faces unique challenges, including diverse data formats, limited access to cutting-edge tools, and constantly evolving regulatory frameworks Addressing these specific needs is essential for effective implementation of Data Warehousing solutions tailored to Vietnam's context.

Scalability and Real-Time Analytics Challenges

Traditionally, data warehouses handle batch processing, but the financial sector now requires real-time analytics to quickly respond to market fluctuations Designing a scalable system that efficiently manages both historical and streaming data is a key challenge, ensuring high performance without compromise.

Integration of Diverse Data Sources

Financial institutions handle diverse data types, including structured transaction records and unstructured customer interactions and market reports However, current research falls short in offering effective solutions for seamlessly integrating these varied data formats into a unified analytical framework Implementing such integration is essential for enabling comprehensive decision-making and gaining deeper insights into financial operations.

METHODOLOGY

Research Analysis Methods

The data analysis process is divided into key phases to ensure that data from diverse sources is efficiently processed, stored, and analyzed

1 Data Collection and Integration (Ingestion)

Data is collected automatically from API crawlers utilizing the VNstock3 library, and from various data sources including Google Sheets, CSV files, and database systems such as PostgreSQL and MariaDB operated by banks and securities firms This seamless integration process ensures efficient and reliable data aggregation for financial analysis and decision-making.

Apache Airflow, ensuring continuous connection, data retrieval, and smooth operations through scheduled tasks (jobs)

The Extract phase involves retrieving raw data from APIs, financial systems, and manual input files, often containing formatting errors and inconsistencies During the Transform phase, this data is cleaned, deduplicated, and standardized before being mapped into Dimensional and Fact tables to fit the Data Warehouse architecture In the Loading phase, data is organized into a multi-layer structure: the Bronze layer stores raw, unprocessed data; the Silver layer contains cleaned and normalized data; and the Gold layer holds aggregated, analysis-ready information Technologies like MinIO (object storage system) and Iceberg (Lakehouse) are utilized to effectively manage these data layers, ensuring efficient data processing and storage for analytics.

3 Data Modeling and Cube Building (Build Cube)

Data Cubes are generated after data processing and storage to enhance query efficiency and analytical performance Integrated with Apache Airflow, dbt (data build tool) automates the creation of these Cubes, streamlining data workflows Designed to support complex queries, these Cubes enable advanced OLAP (Online Analytical Processing) capabilities, facilitating powerful data visualization and analysis tools.

The Trino Query Engine enables direct connection to data layers in the Lakehouse (Bronze, Silver, Gold), facilitating complex queries on large datasets with rapid response times Data from the Cubes is synchronized with visualization tools like Apache Superset for creating dashboards, charts, and reports, and Redash for analytical reporting and performance tracking, ensuring seamless data analysis and insights.

Pipeline efficiency is optimized by monitoring key metrics such as processing time at each stage—ETL, cube building, and querying—while error rates are tracked through detailed logs and analyses of automated task failures using Apache Airflow Query performance is evaluated with tools like Trino to ensure fast response times and effective handling of large datasets Data accuracy is maintained by cross-referencing data across storage layers—Bronze, Silver, and Gold—to ensure consistency and correctness throughout the data pipeline.

27 data from reports and dashboards is thoroughly checked for logical errors or discrepancies

Supporting tools such as Apache Airflow automate ETL workflows and streamline data processing, while dbt designs robust data pipelines and manages Data Cubes for efficient analysis MinIO and Iceberg organize and store data in tiered systems for optimized access and scalability Additionally, Apache Superset and Redash enable the creation of comprehensive analytical reports and visualizations Together, these tools enhance the system's performance and ensure reliable data availability for informed decision-making.

Data Collection Methods

Building a data warehouse system for the Vietnamese banking stock market requires efficient data collection from reliable stock information sources Key methods include utilizing an API crawler to automate data retrieval and implementing Vnstock3 for comprehensive market data aggregation These approaches ensure accurate, up-to-date data, essential for analyzing stock performance and supporting investment decisions in the Vietnamese banking sector.

An API crawler is an automated tool that efficiently scrapes and gathers data from multiple websites and data sources To obtain comprehensive information on banking stocks, we utilize APIs provided by financial platforms such as Vietstock, FiinPro, and VnDirect, as well as stock exchanges This approach ensures accurate, real-time data collection to support financial analysis and decision-making.

APIs provide seamless access to essential financial data, including transaction records, stock prices, company profiles, and detailed financial reports Utilizing APIs enables fast and accurate data retrieval, significantly reducing errors associated with manual data collection Incorporating these APIs into your financial analysis or applications enhances data accuracy and efficiency, making them indispensable tools for real-time market insights.

• The data collection process will include sending GET requests to the API endpoints, then processing and storing the retrieved data in the data system

Vnstock3 is a leading tool for collecting stock data in Vietnam, offering a comprehensive software system with APIs that enable users to query and gather detailed information on stocks, market indices, and other financial data Its user-friendly platform makes it an essential resource for investors and traders seeking real-time stock market insights in Vietnam.

• The data collection method using Vnstock3 involves utilizing pre-developed APIs to connect to Vnstock3's data system and retrieve the necessary information

Data from Vnstock3 will be carefully cleaned, standardized, and integrated into the data warehouse system, ensuring the development of a comprehensive and accurate data repository This centralized data infrastructure will enhance analysis and support informed decision-making processes.

Analysis Methods

Implementing a data warehouse system for the Vietnamese banking stock market relies heavily on advanced data analysis techniques to extract valuable insights These insights support informed investment decisions and effective risk management strategies This article outlines the key analytical methods utilized within the stock data warehouse to optimize data-driven decision-making in the banking sector.

Descriptive analysis helps summarize and describe the characteristics of stock data, such as stock prices, trading volumes, and other financial indicators

Basic statistical methods such as calculating averages, variance, and standard deviation will be used to summarize the features of the data

Descriptive analysis provides an overview of the market and helps identify general trends in the banking stock market

Diagnostic analysis is essential for identifying the root causes of events or phenomena in data For instance, if a bank's stock price declines sharply, this method helps pinpoint contributing factors such as shifts in business strategy, macroeconomic influences, or negative information about the bank Implementing diagnostic analysis enables businesses to understand underlying issues, making it a vital tool for informed decision-making and strategic planning.

Diagnostic analysis uses correlation and regression techniques to determine the relationships between factors and stock price fluctuations

FINDINGS

Data Description

This section offers a comprehensive analysis of data from various sources on the Vietnamese stock market, including stock prices, trading volumes, investments, exchange performance, and market summaries These data tables are essential for developing a clear understanding of the market landscape and serve as a foundation for analyzing bank activities, stock performance, and exchange trends By understanding the structure and importance of these data sets, investors can optimize analytical tools and develop more accurate investment strategies for the Vietnamese stock market.

The BANK entity represents financial institutions that participate in stock transactions and investments It contains essential attributes such as BankID, which uniquely

Each bank is identified by a unique BankName, which stores the name of the institution, while SectorID categorizes the bank within a specific industry sector The create_at and write_at fields are utilized to track the timestamps of record creation and updates, ensuring accurate record management and data integrity This structured approach facilitates efficient organization and status monitoring of banking data for improved database management.

The STOCK entity contains detailed information about individual stocks available in the market, uniquely identified by StockID Each stock is associated with a StockSymbol and StockName, providing clear identification The entity also includes SectorID to specify the industry sector and MarketID to indicate the market where the stock is listed Additionally, create_at and write_at timestamps are used to track record creation and updates, ensuring data accuracy and traceability.

The EXCHANGE entity represents stock exchanges where trading occurs, with key attributes such as ExchangeID serving as a unique identifier, MarketName indicating the exchange's name, Country specifying its location, and Currency defining the transaction currency Additionally, create_at and write_at attributes facilitate accurate tracking of record creation and updates, ensuring data integrity and comprehensive record management.

The STOCK_TRANSACTIONS entity records detailed information about individual stock trading activities, including unique TransactionIDs, associated StockIDs, BankIDs, and ExchangeIDs It specifies the TradeType as either "buy" or "sell," and tracks key data points such as Quantity of stocks traded, Price per unit, and TotalValue calculated by multiplying Quantity by Price The create_at timestamp documents the exact date and time of each transaction, providing a comprehensive overview of stock trading history for accurate financial analysis.

The INVESTMENT_TRANSACTIONS entity captures the investment activities conducted by banks across various stock exchanges, with each transaction uniquely identified by the primary key, InvestmentID It links to specific banks and stock exchanges through the BankID and ExchangeID fields, ensuring clear relationships within the data Key financial metrics recorded include Volume, representing the investment amount; InvestmentValue, denoting the initial capital invested; MarketValue, reflecting the current valuation; and ProfitLoss, showing the profit or loss based on market fluctuations Additionally, the create_at field documents the timestamp of each investment transaction, providing a comprehensive overview of banking investment activities.

A bank can conduct multiple stock and investment transactions, as indicated by the BankID foreign key in the STOCK_TRANSACTIONS and INVESTMENT_TRANSACTIONS tables Each stock is involved in numerous transactions, with the StockID linking it to these records, ensuring a clear relationship between banks, stocks, and various financial activities This interconnected data structure supports efficient tracking of banking transactions and investment activities, highlighting the importance of database relationships in financial management.

The STOCK_TRANSACTIONS table records all stock trades, which occur on specific exchanges identified by ExchangeID There is a direct relationship between stock transactions and exchanges through the ExchangeID, linking the STOCK_TRANSACTIONS and EXCHANGE tables Additionally, investments are associated with particular exchanges, as evidenced by the ExchangeID foreign key in the INVESTMENT_TRANSACTIONS table This interconnected structure highlights the vital role of exchanges in managing stock trades and investments within the database.

This article offers comprehensive data on individual stock trades across multiple stock exchanges, serving as a valuable resource for analyzing stock price trends and trading volume fluctuations It enables investors and analysts to assess how market factors influence stock valuations, facilitating informed decision-making within the financial markets By examining detailed trade information, users can identify patterns and understand the dynamics impacting stock performance across various exchanges.

The timestamp ("time") is essential for time-series analysis, allowing tracking of stock fluctuations over periods Key parameters such as "close," "open," "high," and "low" help assess stock price volatility and identify periods of significant price movement Trading volume ("volume") indicates liquidity levels and investor interest, providing insights into market activity The stock code ("ma_ck") is crucial for classifying and retrieving specific stock information efficiently Additionally, the "exchange" identifies the trading platform where the stock is listed, enabling performance analysis across different exchanges Implementing these key data points enhances stock market analysis and supports informed investment decisions.

Purpose: Provides information about investments in bank stocks, helping to analyze investment activities and assess profitability

Analyzing investments requires key data points such as the bank ID, which is essential for assessing the impact of specific banks on stock performance The stock ID identifies individual stocks within an investment portfolio, while the exchange ID allows for understanding the distribution of investments across different markets Investment规模—represented by volume and investment value—indicates the scale and size of holdings, with market value reflecting the current worth of the stock on a given day Evaluating profit or loss provides insights into the financial performance of each investment Additionally, the timestamp is critical for time-series analysis, enabling investors to track stock fluctuations and trends over time.

Purpose: Provides an overview of information on the performance of stock exchanges, including trading volume, trading value, and market indices

Totalvolume, totalvalue Trading volume and value measure the activity and liquidity of the exchange on a given day marketindex, highestindex, lowestindex, closingindex

Market indices are essential tools for analyzing overall market trends and tracking daily changes, providing insight into the market's direction Incorporating timestamps is crucial for time-series analysis, allowing investors to monitor stock fluctuations accurately over specific periods These combined elements enable a comprehensive understanding of market dynamics and support informed trading decisions.

Purpose: Provides information about banks participating in the stock market, including industry and stock-related metrics

Column Name Description name The bank's name helps classify the banks on the market industry The industry of the bank provides insights into the bank's performance within its sector

34 total_volume The industry of the bank provides insights into the bank's performance within its sector

Purpose: Provides detailed information about stock exchanges, including exchange names, codes, and descriptions

The exchange_name serves as the primary identifier for different exchanges, facilitating easy recognition The exchange_code further streamlines analysis by providing a unique code for each exchange Additionally, the description offers valuable insights into the exchange’s operations and the range of services it provides, enhancing understanding for users and analysts alike.

Purpose: Provides summary information about the stock market on each trading day, helping to analyze the overall market trend

Column Name Description market_index The market index helps track the overall state of the stock market

35 highest_index, lowest_index, closing_index

These indices identify key price levels and measure market volatility, providing essential insights into market behavior The total trading volume and total value offer a comprehensive overview of the market's scale and development on specific trading days, reflecting overall market activity Additionally, timestamps are vital for time-series analysis, allowing for precise tracking of stock fluctuations over time and enabling better forecasting and decision-making.

Analysis and Discussion

This section focuses on building an OLAP (Online Analytical Processing) system by designing data workflows in Apache Airflow, implementing ETL (Extract, Transform, Load) processes, and developing OLAP cubes to analyze Vietnam's stock market and banking data These steps are essential for creating an efficient data warehouse capable of supporting complex analyses of stock price fluctuations, banking performance, and market index changes Implementing a robust OLAP system enables in-depth insights into factors influencing the financial sector in Vietnam, enhancing decision-making and strategic planning.

OLAP (Online Analytical Processing) is a vital tool for multidimensional data analysis and visualization, enabling users to efficiently query and examine data from multiple perspectives By constructing data cubes, OLAP systems facilitate aggregated calculations and in-depth analysis of stock market data, supporting better insights and decision-making Implementing OLAP enhances the ability to analyze complex datasets quickly and accurately, making it an indispensable solution for financial data analysis.

Details of each star schema:

Figure 4: Star Schema for fact stock performance

The data model is specifically designed for a Data Warehouse system focused on analyzing stock market data It features four key tables: Dim_Exchange, which stores information about stock exchanges; Dim_Stock, containing details about individual stocks; Fact_StockPerformance, capturing stock performance metrics; and Dim_Date, providing temporal context for analyzing market trends This structured model enables efficient and comprehensive stock market data analysis, supporting insightful decision-making and investment strategies.

The Dim_Exchange table provides comprehensive details about stock exchanges, including fields such as ExchangeID (primary key), MarketName, Country, and Currency, along with timestamps like create_at and write_at This table enables in-depth analysis of transaction data across various exchanges, countries, and currencies By supporting detailed and accurate data, the Dim_Exchange table is essential for investors and analysts aiming to understand market dynamics and perform cross-market comparisons Optimized for SEO, it highlights the importance of structured exchange information in financial data analysis.

The Dim_Stock table provides comprehensive details about individual stocks, including StockID as the primary key, StockSymbol representing the stock code, and StockName for the company name It also features SectorID, which categorizes stocks by their economic sector, and MarketID, linking to the Dim_Exchange table to specify the stock's trading exchange Additionally, the table tracks data history through create_at and write_at fields This structure enables efficient categorization and in-depth analysis of stocks by industry and exchange, supporting better investment decisions and market insights.

The Fact_StockPerformance table serves as the central fact table, capturing daily performance metrics for individual stocks It includes key fields such as StockPerformanceID (primary key), DateID (linked to the Dim_Date table), and StockID (linked to the Dim_Stock table) Essential performance metrics stored in this table comprise OpeningPrice, HighestPrice, LowestPrice, ClosingPrice, and TotalVolume, enabling users to analyze stock trends over time This comprehensive structure facilitates detailed tracking of stock performance metrics for effective analysis and decision-making.

The Dim_Date table is a standard date dimension, providing detailed information about each day Its fields include DateID (primary key), day_of_week (day of the

This table includes important time-related fields such as week number, day of the month, month, and year, along with flags indicating weekends and holidays It enables comprehensive data analysis across different time frames, including daily, weekly, monthly, and quarterly intervals Leveraging this structured time data facilitates more accurate trend identification and strategic planning By incorporating these key temporal elements, businesses can enhance their insights into seasonal patterns and improve decision-making processes.

The database is structured to facilitate comprehensive analysis by establishing key relationships between tables Dim_Exchange connects to Dim_Stock via MarketID, allowing for the examination of stocks across different exchanges Fact_StockPerformance links to Dim_Stock through StockID and to Dim_Date via DateID, enabling detailed tracking of stock performance over time This interconnected schema supports multi-dimensional analysis from various perspectives, including exchange, time, and economic sector, enhancing data-driven decision-making.

Figure 5: Star schema for fact daily market

The Dim_Date table is a vital component that stores comprehensive calendar date information, including the day of the week, weekend or holiday indicators, and the date’s position within the week, month, quarter, and year It also tracks timestamps for technical processing, enabling accurate and efficient date-related analysis in data warehousing and business intelligence projects.

The Fact_DailyMarket table captures essential daily market metrics, including total volume, total value, market index, highest and lowest index levels, and the closing index It is linked to the Dim_Date table through DateID and connected to the Dim_Market table via MarketID, enabling comprehensive analysis of daily market performance.

Dim_Market (or Dim_Exchange) Table: This table provides metadata about markets, including their names, locations, currencies, and timestamps for record creation and updates

Figure 6: Star schema for fact investment

This star schema is designed for analyzing investment data:

Stores information about banks, including their names, associated sectors, and stock details

Stores metadata about stock exchanges, such as their names, locations, and associated currencies

Provides details about dates, including calendar breakdowns like day, week, month, quarter, and year, as well as flags for weekends and holidays

Our system stores comprehensive transactional data related to investments, capturing key metrics such as investment volume, value, market capitalization, and profit or loss These data are integrated with three-dimensional tables—Dim_Bank, Dim_Exchange, and Dim_Date—enabling advanced multidimensional analysis to gain insights into investment performance across different banks, exchanges, and time periods This approach supports detailed investment analytics and data-driven decision-making.

Figure 7: Star schema for fact exchange performance

This star schema is used for analyzing the performance of exchanges

Provides descriptive information about exchanges, such as their names, countries, and the currencies they use

Stores calendar and temporal details, enabling time-based analysis of exchange performance This includes details like specific dates, holidays, weekends, and timestamp information

This article details how daily performance metrics of financial exchanges are recorded, including trading volume, total trading value, and market index fluctuations such as highest, lowest, and closing values These metrics are linked to the Dim_Date table for precise time-based analysis and to the Dim_Exchange table for market-specific insights, ensuring comprehensive monitoring of exchange performance Optimized for SEO, this process enhances data tracking, supports market analysis, and facilitates informed decision-making in financial analytics.

Apache Airflow is a robust workflow management tool ideal for automating ETL processes related to stock market data By designing workflows in Airflow, you can automatically collect, process, and load stock market data into your data warehouse, ensuring efficient and reliable data pipeline automation.

In an Airflow workflow, the process begins with creating necessary database tables to store essential financial data, including stock prices, trading volumes, market indices, and investment transactions Next, data collection involves utilizing API crawlers or external sources to gather up-to-date information on these financial metrics The collected data is then processed to remove invalid or incomplete entries, ensuring consistency and standardization before loading Finally, the processed data is systematically loaded into designated database tables such as stock_data, investment, and exchange_performance to enable efficient data analysis and reporting.

In Airflow, ETL tasks are structured as Directed Acyclic Graphs (DAGs), enabling precise scheduling and efficient monitoring of data collection and processing workflows These DAGs facilitate seamless automation of ETL processes, ensuring reliable tracking and management of data pipeline tasks.

Figure 8: Dag for creating table

Figure 9: Dag for crawling data

Figure 10: Dag for merging data

Evaluations

Data Integration and Collection Capability

The data warehouse for the Vietnam banking stock market has been successfully developed to integrate diverse data sources, including key banking indices such as VNIndex, HNXIndex, Vietcombank, and BIDV, providing comprehensive and centralized financial data.

Integrating data from multiple sources such as stock exchanges, financial reports, and market data is essential for comprehensive stock analysis This approach enriches the dataset, providing more accurate and reliable information that helps users make well-informed investment decisions.

• Impact: This makes the data warehouse powerful, flexible, and capable of supporting multiple analytical purposes, from stock analysis to analyzing banking indices, thus broadening its usability in different scenarios

Clarity and Visualization of Data

The thesis highlights the use of visualization tools like Power BI and Superset to effectively display stock data with dynamic effects and vibrant colors By utilizing line and bar charts, users can easily monitor trends and fluctuations in stocks and market indices, enhancing data comprehension and decision-making.

• Explanation: Using visual charts helps users quickly grasp the trends and fluctuations of the market without analyzing too many raw numbers

• Impact: This enhances accessibility and efficiency in delivering information, especially for those who are not financial experts It also improves the user experience and reduces errors in analysis

• Finding: The data warehouse system has the scalability to add more banks or new stock market indices

• Explanation: Building a system with scalability is crucial to meeting the changing needs of the stock market, such as when new banks list their shares or when market indices change

• Impact: The data warehouse remains effective and long-term, ensuring that the system stays aligned with market changes, while also optimizing maintenance and updating costs

• Finding: While the system meets user needs, there is still room for improvement in processing speed and data optimization as data volume increases

• Explanation: As stock market data grows, optimizing databases and query algorithms will help speed up processing and reduce latency when users query information

Improving performance and optimization is essential to prevent the system from struggling to serve a large number of users simultaneously or manage vast amounts of data Without these enhancements, user experience can be negatively impacted, leading to slower response times and reduced reliability Prioritizing system optimization ensures seamless scalability and maintains a high-quality user experience during peak usage.

Data synchronization issues often arise when stock exchange data is not aligned in terms of timing or format, leading to asynchronous updates For instance, stock exchange information may update at different times, and banks might delay releasing financial reports, causing inconsistencies in financial data analysis Ensuring timely and uniform data collection is essential for accurate financial insights and informed investment decisions.

• Impact: Poor synchronization may lead to errors in analysis and investment decisions This can affect the accuracy of the data warehouse and reduce user trust

Lack of Advanced Analytical Methods

• Cause: The data warehouse mainly focuses on data collection and visualization, without integrating advanced analysis models such as technical analysis, market sentiment analysis, or stock price forecasting

• Impact: This may cause users to miss out on deeper market insights and trends, leading to inaccurate investment decisions

• The system integrates and displays data for the Vietnam banking stock market clearly and understandably

• High visual accessibility through charts and interactive tools

• Good scalability for adding new banks or indices

• Widely applicable, supporting both basic and advanced analysis

Evaluation of the Thesis on Building a Data Warehouse for the Vietnam Banking Stock Market

Data Integration and Collection Capability

The Vietnam banking stock market data warehouse has been successfully developed to seamlessly integrate multiple data sources, including key banking indices such as VNIndex, HNXIndex, and major banks like Vietcombank and BIDV.

For effective stock analysis, integrating data from multiple sources such as stock exchanges, financial reports, and market data is essential This comprehensive approach enriches the information available, enabling users to make more accurate and informed investment decisions.

• Impact: This makes the data warehouse powerful, flexible, and capable of supporting multiple analytical purposes, from stock analysis to analyzing banking indices, thus broadening its usability in different scenarios

Clarity and Visualization of Data

The thesis highlights the use of advanced visualization tools like Power BI and Superset to display stock data with dynamic effects and vibrant colors, enhancing user engagement Through the use of line and bar charts, users can effortlessly monitor trends and fluctuations in stocks and market indices, making data analysis more intuitive and accessible This approach improves the clarity and effectiveness of financial data presentation, facilitating better decision-making.

• Explanation: Using visual charts helps users quickly grasp the trends and fluctuations of the market without analyzing too many raw numbers

• Impact: This enhances accessibility and efficiency in delivering information, especially for those who are not financial experts It also improves the user experience and reduces errors in analysis

• Finding: The data warehouse system has the scalability to add more banks or new stock market indices

• Explanation: Building a system with scalability is crucial to meet the changing needs of the stock market, such as when new banks list their shares or when market indices change

• Impact: The data warehouse remains effective and long-term, ensuring that the system stays aligned with market changes, while also optimizing maintenance and updating costs

• Finding: While the system meets user needs, there is still room for improvement in processing speed and data optimization as data volume increases

• Explanation: As stock market data grows, optimizing databases and query algorithms will help speed up processing and reduce latency when users query information

Improving system performance and optimization is crucial to ensure it can efficiently serve a large number of users simultaneously and manage vast amounts of data Without these enhancements, users may experience slower response times and degraded service quality, ultimately leading to a poorer user experience Prioritizing optimization directly impacts the system's ability to handle high traffic and data loads effectively.

Synchronization issues in stock exchange data often arise when data updates occur asynchronously or financial reports from banks are delayed, leading to inconsistencies in timing and format These discrepancies can hinder accurate analysis and decision-making in financial markets Ensuring data uniformity and timely information release is essential for reliable market insights.

• Impact: Poor synchronization may lead to errors in analysis and investment decisions This can affect the accuracy of the data warehouse and reduce user trust

Lack of Advanced Analytical Methods

• Cause: The data warehouse mainly focuses on data collection and visualization, without integrating advanced analysis models such as technical analysis, market sentiment analysis, or stock price forecasting

• Impact: This may cause users to miss out on deeper market insights and trends, leading to inaccurate investment decisions

Data Security and Protection Issues

• Cause: Stock market data can be highly sensitive, and it is unclear what security measures are in place for storing and transmitting data

• Impact: If data security is not ensured, the system may be vulnerable to attacks or breaches, resulting in significant financial losses and a damaged system reputation

Performance Optimization and Resource Efficiency

• Cause: While the system works well with a small amount of data, as data grows, optimizing processing and storage is necessary to save resources

• Impact: Without optimization, the system may become slow and costly to deploy at scale, affecting operational efficiency and the system's scalability

The final chapter summarizes the key findings and discusses their implications for the financial sector and data warehousing practices It highlights the study’s contributions to stock market analysis and data management, providing valuable insights for industry professionals The chapter also offers practical recommendations for future research, policy enhancements, and the application of data warehousing in financial analysis Additionally, it addresses the limitations encountered during the research process and proposes strategies to overcome these challenges in future studies, ensuring ongoing advancements in the field.

Conclusion

Data fragmentation in stock market platforms creates challenges for users, as most websites provide an overwhelming amount of information across multiple sectors This abundance of data makes it difficult to focus on specific areas, especially banking stock indices Our specialized system addresses this issue by exclusively focusing on banking stock indices, simplifying the user experience and enabling easy tracking of relevant market data.

Centralized and optimized banking data ensures that all financial information is collected, stored, and managed within a unified system This centralized approach enables users to quickly access relevant data without the hassle of filtering through unrelated information, improving efficiency and decision-making in banking operations.

Modern data management methods, including advanced techniques like Slowly Changing Dimensions (SCD), are essential for accurately storing and processing historical banking data These methods enable the precise recording of data changes over time, ensuring reliable information for long-term trend analysis By implementing such techniques, financial institutions can maintain comprehensive historical records, supporting better decision-making and improved data integrity.

Focusing exclusively on the banking sector, the system facilitates in-depth analysis by enabling investors, financial organizations, and experts to easily evaluate banking-related indices This targeted approach helps stakeholders make more accurate, informed, and timely decisions, enhancing overall financial performance and strategic planning within the industry.

Scalability: The system can seamlessly integrate data from new banks or expand to include additional indices without affecting its performance or existing structure.

Limitations of the Study

Although the Vietnamese banking stock data warehouse has achieved certain successes, the study still has some limitations that need to be addressed:

• Data Processing Performance: With the continuous growth of the stock market, the volume of data that needs to be processed is increasing Currently,

65 the system is not fully optimized for handling and querying large amounts of data, which could affect speed and efficiency when users request information

The system currently offers data visualization and general information; however, it lacks integration of advanced financial analytics such as technical analysis, market sentiment assessment, and stock price prediction models, limiting its ability to provide comprehensive investment insights.

• Data Security: In the stock market environment, data security is a very important factor The current system lacks optimal security measures to protect sensitive data from external threats

• Data Synchronization: Synchronization of data between stock exchanges and banks is sometimes not entirely accurate or timely, which can affect the reliability of information and investment decisions.

Proposals and Recommendations

To address these limitations and improve the effectiveness of the Vietnamese banking stock data warehouse, the study proposes the following solutions:

To optimize system performance, leverage advanced technologies like distributed databases, big data storage and processing techniques, and caching strategies These tools significantly enhance data retrieval speeds, ensuring your system can efficiently handle increasing data volumes without compromising speed or responsiveness.

Integrating advanced analytical models such as technical analysis, machine learning, and market sentiment analysis can significantly enhance market trend prediction and analysis These cutting-edge tools empower users with deeper insights into stock market movements, enabling more informed investment decisions and strategic trading By leveraging these technologies, investors can stay ahead of market trends and optimize their portfolio performance effectively.

To enhance data security, it is essential to adopt advanced measures such as robust data encryption, strong user authentication protocols, and comprehensive strategies to defend against external cyber threats Implementing these security best practices ensures the protection and confidentiality of user data, safeguarding your system from potential breaches.

Implementing efficient data synchronization processes and tools is essential to ensure that information from stock exchanges and banks is updated accurately and promptly These measures help guarantee data accuracy and timeliness, supporting reliable decision-making and seamless operational workflows.

This study highlights the importance of ongoing research and development to enhance the data warehouse’s capabilities Future applications should include supporting international stock indices, conducting in-depth analysis of various financial indicators, and integrating AI tools for more accurate prediction of future market trends Such advancements will expand the data warehouse’s utility in financial market analysis and decision-making.

This research has successfully developed a robust data warehouse system that enables investors and banking organizations to efficiently manage and analyze stock data By providing advanced tools for data analysis, it enhances decision-making processes and financial insights Additionally, the system opens new avenues for implementing modern analytical methods, fostering innovation and growth in the financial industry.

1 A B Sharma & P A Meena (2018), “Data Warehouse and Data Mining:

Concepts, Techniques, and Real-World Applications,” Academic Press, pp 45-

2 J Barnett (2022), “Data Warehousing Market Size, Growth, and Trends (2023- 2028),” Medium, pp 50-70

3 M J Fernandes (2019), “A Data Warehouse-Based Modelling Technique for Stock Market Analysis,” ResearchGate, 32 (4), pp 12-18

4 M S M Rahman (2016), “Data Warehouse Systems: Design and Implementation,” Springer, pp 1-234

5 S J Miller (2021), “Big Data Lake Solution for Warehousing Stock Data and Tweet Data,” GitHub, pp 101-120

6 T L Chen (2019), “The Role of OLAP in Stock Market Data Analysis,” Springer, pp 78-101

SOCIALIST REPUBLIC OF VIETNAM Independence – Freedom - Happiness

Supervisor’s full name: NGUYEN QUANG THUAN

Position: Doctor At: International School – Vietnam National University, Ha Noi

- Student’s full name: DAO XUAN HUONG

I hereby provide my evaluation for the thesis writing process of student as following:

Student is eligible for the thesis defense

Student is ineligible for the thesis defense

SOCIALIST REPUBLIC OF VIETNAM Independence – Freedom - Happiness

EXPLANATORY REPORT ON CHANGES/ADDITIONS BASED ON THE DECISION OF GRADUATION THESIS COMMITTEE

FOR UNDERGRADUATE PROGRAMS WITH DEGREE AWARDED BY

Student’s full name: DAO XUAN HUONG

Graduation thesis topic: Data warehouse and stock analysis for banks in Vietnam

According to VNU-IS Decision No …… QĐ/TQT, dated … / … / …… , the Graduation Thesis Committee for Bachelor programs awarded by Vietnam National University, Hanoi, was established to oversee the thesis defense process The thesis was successfully defended and subsequently revised in designated sections to ensure academic quality and compliance with university standards.

No Change/Addition Suggestions by the

Committee Detailed Changes/ Additions Page

The architecture should be divided into 5-6 layers and modify Figure 1 to include these layers

Modify the existing architecture to incorporate 5-6 distinct layers 25

2 The introduction section is too long Shorten the Introduction Section to make it more concise 9

3 The literature review is too short

Expand the literature review by adding a survey on past and current research related to the selected topic, especially focusing on Vietnam

Add an OLTP diagram and update the OLAP diagram to better reflect the revised architecture

The results are mostly related to the stock market, with only one point related to the bank

Revise the research questions to ensure they align more closely with the study's focus and capabilities

Ngày đăng: 24/05/2025, 17:05

Nguồn tham khảo

Tài liệu tham khảo Loại Chi tiết
1. A. B. Sharma & P. A. Meena (2018), “Data Warehouse and Data Mining: Concepts, Techniques, and Real-World Applications,” Academic Press, pp. 45- 89 Sách, tạp chí
Tiêu đề: Data Warehouse and Data Mining: Concepts, Techniques, and Real-World Applications
Tác giả: A. B. Sharma, P. A. Meena
Nhà XB: Academic Press
Năm: 2018
2. J. Barnett (2022), “Data Warehousing Market Size, Growth, and Trends (2023- 2028),” Medium, pp. 50-70 Sách, tạp chí
Tiêu đề: Data Warehousing Market Size, Growth, and Trends (2023- 2028)
Tác giả: J. Barnett
Nhà XB: Medium
Năm: 2022
4. M. S. M. Rahman (2016), “Data Warehouse Systems: Design and Implementation,” Springer, pp. 1-234 Sách, tạp chí
Tiêu đề: Data Warehouse Systems: Design and Implementation
Tác giả: M. S. M. Rahman
Nhà XB: Springer
Năm: 2016
5. S. J. Miller (2021), “Big Data Lake Solution for Warehousing Stock Data and Tweet Data,” GitHub, pp. 101-120 Sách, tạp chí
Tiêu đề: Big Data Lake Solution for Warehousing Stock Data and Tweet Data
Tác giả: S. J. Miller
Nhà XB: GitHub
Năm: 2021
6. T. L. Chen (2019), “The Role of OLAP in Stock Market Data Analysis,” Springer, pp. 78-101 Sách, tạp chí
Tiêu đề: The Role of OLAP in Stock Market Data Analysis
Tác giả: T. L. Chen
Nhà XB: Springer
Năm: 2019
3. M. J. Fernandes (2019), “A Data Warehouse-Based Modelling Technique for Stock Market Analysis,” ResearchGate, 32 (4), pp. 12-18 Khác

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

w