1. Trang chủ
  2. » Luận Văn - Báo Cáo

Khóa luận tốt nghiệp Hệ thống thông tin: Predicting rainfall using multiple data sources

78 1 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Predicting Rainfall using Multiple Data Sources
Tác giả Tang Quoc Hung, Vo Nguyen Dang Khoa
Người hướng dẫn Ph.D. Le Kim Hung
Trường học University of Information Technology
Chuyên ngành Information Systems
Thể loại Thesis
Năm xuất bản 2023
Thành phố Ho Chi Minh City
Định dạng
Số trang 78
Dung lượng 49,99 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Cấu trúc

  • 1.3. Scope of the StUỦY.......................... ....- Sàn TH TT TH TH TH TH HT HT HH nghi 3 (12)
  • 1.4 Research Methodology ............................. -s- 5< tk 19v HH TH HH gu 4 (13)
  • 1.5 Research EnVITOnITIIK.............................-.- ó5 6 11391 E91E 1 91 1931 1 1n HH ng ung 5 (0)
  • Chapter 2 Literature r@Vẽ@W........................------ 5c nu nh my 7 (10)
    • 2.1 Evolution of Rainfall Prediction Model s......................... ----- 555 5+ £+*+e£s£+xeEeeeerseeerseexee 7 (16)
    • 2.2 Identified Gaps and Research ẽNeedS...........................- ----- 6+ x2 9 2 1 xe 8 (17)
  • Chapter 3 Theoretical backgrOUnid ................................-- .- --- ----- -- + s<+xseExxerxexsrsexseerxsrerrerexee 10 (16)
    • 3.1 Meteorological Data and Rainfall PredICtiOn..........................- -- -5- 5 5+5 ++x+s+eeexereessss 10 (19)
      • 3.1.1 Significance of Meteorological ẽDatia...........................-.- ôsex kh 1t nung ri, 10 (19)
      • 3.1.2 Exploring Traditional Rainfall Prediction Models ...........................--- - 5 55s 5s £sxseeseexe 10 (19)
    • 3.2 Statistical Methods in Rainfall PrediCfIOù.......................--- 5 22+ +sEsvsvEseeerseetrsesee 10 (19)
      • 3.2.1 Regression Analysis: Unveiling Model Nuance .............................--- - 5-5 5ô + csesvrseexe 10 (19)
      • 3.2.2 Temporal Insights: Time Series Analysis ........................... 5 c5 S2 2k ki, 10 3.3. Advanced Numerical Weather Prediction Models......................... ----- 5+5 <+x+e<+<+zsess2 11 (19)
      • 3.3.1 Dynamic Simulation Models: Atmospheric Dynamic Simulafion (20)
      • 3.3.2 Challenges and ÍnnOVafIOTNS..................... ¿+ + + xxx 191v vn TT HH nh net 11 (0)
    • 3.4 Empirical Models: Learning from Historical ModelÌs.................................- --- 5s <s++s+s+2 lãi (20)
      • 3.4.1 Analyzing Historical RelationslhIpS...........................- --- c5 65x Sx 1# EE* kg ggnrin 11 (20)
      • 3.4.2 Integration with Machine Learning... eeesesseesseseseseseeeeeeeeseseneneeeeeeerseseseseseseraeaes 12 (21)
    • 3.5 Machine learning and Deep neural netWorkS ........................ -- --- 5+ +essx+txeEseeersesrsrree 12 (21)
      • 3.5.1 Machine learning..............................-- -- + xxx vn HT ng HH net 12 (21)
      • 3.5.2 Deep neural n€fWOTKS............................--s- ôvn HT HH TH TH HH ng rà 16 (0)
    • 3.6 Technical Frar€WOTKS............................- << s1 tk 91 1 91 1T. HH ng TH HH ng 24 (0)
      • 3.6.1 OpenWeatherèMap.............................. ôkh TH TH TH TH HT HH TH ri 24 (33)
      • 3.6.2 Sentinel nh (34)

Nội dung

reaching implications, offering valuable insights that can revolutionize currentpractices in fields critical to societal well-being.1.2 Objectives This research project encompasses sever

Scope of the StUỦY - Sàn TH TT TH TH TH TH HT HT HH nghi 3

- The study will primarily concentrate on rainfall prediction for specific geographical regions, allowing for in-depth analysis and tailored modeling based on regional weather patterns.

- The research will utilize data from diverse sources, with a primary emphasis on integrating information from the OpenWeatherMap API and Sentinel Hub API. However, other relevant weather stations, satellite imagery, and radar systems may also be considered for a comprehensive analysis.

- The study will focus on historical and real-time weather data to develop a predictive model capable of providing timely forecasts The temporal scope will encompass both short-term and long-term prediction capabilities.

- The research will explore various machine learning techniques, with a specific emphasis on deep neural networks, to uncover patterns and relationships within the data for accurate rainfall predictions.

The developed rainfall prediction model will be assessed for its applicability in key sectors, including agriculture, water resource management, and disaster prevention The study aims to provide insights into how the model can contribute to decision-making processes in these domains.

The scope includes a comparative analysis of the proposed model against traditional rainfall prediction methods, emphasizing the model's strengths and limitations in diverse scenarios.

Recognizing the complexities involved in weather prediction, the study acknowledges certain limitations, such as uncertainties inherent in meteorological data and potential variations in data quality from different sources.

Research Methodology -s- 5< tk 19v HH TH HH gu 4

Weather Data Sources: Acquire historical and real-time weather data from diverse sources, including the OpenWeatherMap API and Sentinel Hub API, to create a comprehensive dataset.

Cleaning and Formatting: Scrutinize and clean the acquired data to address missing values, outliers, and inconsistencies Standardize the format to ensure compatibility across different sources.

Feature Engineering: Extract relevant features from the data, such as temperature, humidity, wind speed, and atmospheric pressure, to enhance the model's predictive capabilities.

Algorithm Selection: Evaluate and select machine learning algorithms suitable for rainfall prediction, considering factors such as regression techniques and ensemble methods.

- Training and Validation: Train the model using historical data and validate its performance against known outcomes Adjust hyperparameters to optimize predictive accuracy.

- Architecture Design: Develop deep neural network architectures tailored for rainfall prediction, leveraging frameworks such as TensorFlow or PyTorch.

- Training and Optimization: Train the deep neural network using historical data, fine-tuning parameters through iterative optimization to enhance the model's ability to capture intricate patterns.

- Streaming Data Integration: Implement mechanisms to process and integrate real-time data streams, ensuring the model's responsiveness to dynamic weather conditions.

- Continuous Learning: Explore methodologies for continuous learning to adapt the model to evolving patterns in the data.

- Geographical Validation: Evaluate the model's performance across different geographical regions to assess its adaptability and generalization.

- Climate Variability Analysis: Investigate the model's response to varying climate conditions to ascertain its robustness in diverse environments.

- Benchmarking: Compare the performance of the developed model against traditional rainfall prediction methods, utilizing metrics such as Mean Squared Error (MSE) and Root Mean Squared Error (RMSE).

- Statistical Significance: Conduct statistical tests to determine the significance of improvements achieved by the proposed model.

- Utilize Python as the primary programming language for its versatility, extensive libraries, and frameworks relevant to machine learning and data analysis, including but not limited to TensorFlow, PyTorch, and scikit-learn.

- Choose a suitable IDE, such as Jupyter Notebooks or Visual Studio Code, to streamline code development, debugging, and visualization of results.

- Employ scripts and libraries in Python to interface with APIs, particularly the

OpenWeatherMap API and Sentinel Hub API, for seamless access to real-time and historical weather data.

- Implement data cleaning and preprocessing using Pandas and NumPy to handle missing values, outliers, and standardize the format of the acquired datasets.

Machine Learning and Deep Learning Frameworks

- Leverage TensorFlow and PyTorch for the implementation of machine learning algorithms and deep neural networks These frameworks provide robust support for building and training complex models.

- Implement streaming data processing using tools like Apache Kafka or relevant

Python libraries to enable the model to process and adapt to real-time weather data.

- Depending on the scale of data and complexity of computations, consider utilizing GPUs or TPUs to accelerate training processes, enhancing the efficiency of machine learning and deep learning tasks.

- Utilize collaborative platforms such as Slack or Microsoft Teams for effective communication among research team members, fostering collaboration and knowledge sharing.

2.1 Evolution of Rainfall Prediction Models

In the past few years, weather forecasting in general or rain detection in particular has been gaining attention and being used a lot in daily life The literature review offers a comprehensive analysis of current research endeavors in rainfall prediction, encompassing studies both nationally and internationally By evaluating various methodologies, this chapter aims to illuminate existing knowledge, identify persisting challenges, and delineate avenues for further exploration Through research and understanding of projects, we see that the following projects are typical projects for this field of research.

The first study introduces a groundbreaking approach to rainfall prediction through the utilization of a Hybrid Neural Network (HNN) Although initial results are promising, a critical evaluation underscores the necessity for in-depth investigations into feature reduction techniques and clustering methodologies Understanding the intricacies of HNN applications is crucial for refining its potential in real-world scenarios [1].

The second study, the focus is on the significance of accurate rainfall forecasting, with an inclination towards Artificial Neural Networks (ANNs) due to the inherent nonlinearities in rainfall data The review reveals a notable gap in translating these advanced techniques into accessible formats for non-experts Bridging this communication divide is crucial for widespread application and comprehension [2].

Delving into data mining techniques, the third study critically analyzes their effectiveness in rainfall prediction and their potential applications in various sectors.While showcasing their utility in construction, transportation, and agriculture, the study highlights the need for further exploration to enhance the adaptability and accuracy of these techniques [3].

Focusing on long-term rainfall prediction using a linear regression model, the fourth study showcases its potential applications However, it becomes evident that sophisticated ensemble techniques are required to improve accuracy, especially concerning long-term predictions This study sets the stage for refining and optimizing ensemble methods to enhance overall model performance [4].

The fifth study recognizes the vital role of agriculture in India, highlighting the importance of rainfall prediction Acknowledging the challenges, the study explores various Machine Learning algorithms, such as ARIMA, Artificial Neural Network (ANN), Logistic Regression, Support Vector Machine, and Self Organizing Map. Differentiating between linear and nonlinear models, the study emphasizes the applicability of ANN for predicting rainfall [5].

2.2 Identified Gaps and Research Needs

A recurring theme across the studies is the potential for integrating different techniques However, a research gap exists in systematically combining Hybrid Neural Networks, Artificial Neural Networks, and Data Mining techniques to create more robust and accurate predictive models Future research should explore cohesive methodologies to leverage the strengths of each approach.

Another consistent challenge identified is the lack of user-friendly approaches for accessing and implementing rainfall prediction models Addressing this gap is imperative to broaden the applications and understanding of these models beyond the research community Developing intuitive interfaces and simplified methodologies can facilitate broader adoption.

While ensemble techniques are acknowledged for their benefits, there is a need for more sophisticated approaches Further research should focus on refining and optimizing ensemble methods to enhance the overall performance of rainfall prediction models Understanding the nuances of ensemble techniques can contribute significantly to model robustness.

To sum up, this literature review meticulously examines current research in rainfall prediction, emphasizing accomplishments, and highlighting critical gaps and challenges The identified research needs will act as a compass for the current study,steering it towards contributing novel insights to advance rainfall prediction methodologies.

3.1 Meteorological Data and Rainfall Prediction

Meteorological data serves as the cornerstone for accurate weather predictions, incorporating multifaceted factors such as temperature, humidity, wind speed, and atmospheric pressure The meticulous acquisition and analysis of this data form the basis for developing robust models for rainfall prediction.

3.1.2 Exploring Traditional Rainfall Prediction Models

In this section, we delve into current traditional models employed in meteorology for rainfall prediction These models range from statistical methods to numerical weather prediction models and empirical models A comprehensive evaluation of their strengths and limitations is undertaken to establish a groundwork for the proposed advanced rainfall prediction model.

3.2 Statistical Methods in Rainfall Prediction

3.2.1 Regression Analysis: Unveiling Model Nuances

Within the realm of traditional models, statistical methods play a crucial role, with a specific focus on regression analysis This section will detail the intricacies of regression models, highlighting their flexibility with diverse datasets and effectiveness in capturing relationships between meteorological variables and rainfall prediction A meticulous discussion provides insights into statistical aspects influencing rainfall prediction.

3.2.2 Temporal Insights: Time Series Analysis

Another facet of statistical methods under consideration is time series analysis, a vital tool for deciphering time-dependent models in meteorological data This section

10 discusses the application of time series models for rainfall prediction, emphasizing their utility in analyzing trends, identifying seasons, and extracting valuable insights for prediction models A nuanced discussion opens up, particularly focusing on the temporal aspect of statistical methods.

3.3 Advanced Numerical Weather Prediction Models

3.3.1 Dynamic Simulation Models: Atmospheric Dynamic Simulation

Literature r@Vẽ@W 5c nu nh my 7

Evolution of Rainfall Prediction Model s - 555 5+ £+*+e£s£+xeEeeeerseeerseexee 7

In the past few years, weather forecasting in general or rain detection in particular has been gaining attention and being used a lot in daily life The literature review offers a comprehensive analysis of current research endeavors in rainfall prediction, encompassing studies both nationally and internationally By evaluating various methodologies, this chapter aims to illuminate existing knowledge, identify persisting challenges, and delineate avenues for further exploration Through research and understanding of projects, we see that the following projects are typical projects for this field of research.

The first study introduces a groundbreaking approach to rainfall prediction through the utilization of a Hybrid Neural Network (HNN) Although initial results are promising, a critical evaluation underscores the necessity for in-depth investigations into feature reduction techniques and clustering methodologies Understanding the intricacies of HNN applications is crucial for refining its potential in real-world scenarios [1].

The second study, the focus is on the significance of accurate rainfall forecasting, with an inclination towards Artificial Neural Networks (ANNs) due to the inherent nonlinearities in rainfall data The review reveals a notable gap in translating these advanced techniques into accessible formats for non-experts Bridging this communication divide is crucial for widespread application and comprehension [2].

Delving into data mining techniques, the third study critically analyzes their effectiveness in rainfall prediction and their potential applications in various sectors.While showcasing their utility in construction, transportation, and agriculture, the study highlights the need for further exploration to enhance the adaptability and accuracy of these techniques [3].

Focusing on long-term rainfall prediction using a linear regression model, the fourth study showcases its potential applications However, it becomes evident that sophisticated ensemble techniques are required to improve accuracy, especially concerning long-term predictions This study sets the stage for refining and optimizing ensemble methods to enhance overall model performance [4].

The fifth study recognizes the vital role of agriculture in India, highlighting the importance of rainfall prediction Acknowledging the challenges, the study explores various Machine Learning algorithms, such as ARIMA, Artificial Neural Network(ANN), Logistic Regression, Support Vector Machine, and Self Organizing Map.Differentiating between linear and nonlinear models, the study emphasizes the applicability of ANN for predicting rainfall [5].

Theoretical backgrOUnid - - - + s<+xseExxerxexsrsexseerxsrerrerexee 10

Meteorological Data and Rainfall PredICtiOn - -5- 5 5+5 ++x+s+eeexereessss 10

Meteorological data serves as the cornerstone for accurate weather predictions, incorporating multifaceted factors such as temperature, humidity, wind speed, and atmospheric pressure The meticulous acquisition and analysis of this data form the basis for developing robust models for rainfall prediction.

3.1.2 Exploring Traditional Rainfall Prediction Models

In this section, we delve into current traditional models employed in meteorology for rainfall prediction These models range from statistical methods to numerical weather prediction models and empirical models A comprehensive evaluation of their strengths and limitations is undertaken to establish a groundwork for the proposed advanced rainfall prediction model.

Statistical Methods in Rainfall PrediCfIOù . - 5 22+ +sEsvsvEseeerseetrsesee 10

3.2.1 Regression Analysis: Unveiling Model Nuances

Within the realm of traditional models, statistical methods play a crucial role, with a specific focus on regression analysis This section will detail the intricacies of regression models, highlighting their flexibility with diverse datasets and effectiveness in capturing relationships between meteorological variables and rainfall prediction A meticulous discussion provides insights into statistical aspects influencing rainfall prediction.

3.2.2 Temporal Insights: Time Series Analysis

Another facet of statistical methods under consideration is time series analysis, a vital tool for deciphering time-dependent models in meteorological data This section

10 discusses the application of time series models for rainfall prediction, emphasizing their utility in analyzing trends, identifying seasons, and extracting valuable insights for prediction models A nuanced discussion opens up, particularly focusing on the temporal aspect of statistical methods.

3.3 Advanced Numerical Weather Prediction Models

3.3.1 Dynamic Simulation Models: Atmospheric Dynamic Simulation

Transitioning to advanced methods, numerical weather prediction models take precedence This section will elucidate the principles behind dynamic simulation methods, where the complex interaction of atmospheric variables is simulated through sophisticated mathematical models A thorough analysis sheds light on the complexity and features of numerical models in enhancing the accuracy of rainfall predictions.

Despite their power, numerical weather prediction models face challenges This section meticulously details these challenges, from computational limitations to extensive data input requirements Simultaneously, it explores recent innovations to address these difficulties, providing a comprehensive overview of the current landscape and paving the way for the proposed advanced rainfall prediction model.

Empirical Models: Learning from Historical ModelÌs .- - 5s <s++s+s+2 lãi

Empirical models, relying on historical models and observed relationships, offer a different perspective in rainfall prediction This section breaks down the essence of empirical methods, emphasizing the importance of learning from past data trends and developing models based on observed relationships A comprehensive review opens up, highlighting the strengths and potential pitfalls of empirical models, forming the foundation for the proposed advanced model.

A recent trend in empirical models is the integration of machine learning techniques This section delves into the practical applications of machine learning in rainfall prediction, emphasizing the synergy between historical data trends and the predictive power of machine learning algorithms Insights from this exploration lay the foundation for proposing an advanced rainfall prediction model that combines the strengths of empirical methods and machine learning Machine learning and Deep neural networks

Machine learning and Deep neural netWorkS - 5+ +essx+txeEseeersesrsrree 12

Machine learning (ML) is a subfield of artificial intelligence (AI) This is a field of research that gives computers the ability to improve themselves by automatically learning raw data (training data) or experience (gained while learning) Machine learning can automatically predict or make decisions on its own with minimal human intervention [6].

There are 2 goals of Machine Learning [7]:

- Classifying data: categorizing data depending on developed models (e.g., detecting spam emails)

- Making predictions: forecasting some future outcomes based on models (e.g., predicting house prices in a city.)

Data Data ae Model Model Model collection preparation SUMAN optimization evaluation deployment the model © Scribbr

Figure 3.5.1 Machine learning process flow [7]

1 Data collection: Computers can only train (learn) based on raw data, which are organized and prepared from multiple sources, such as audio recordings, browser history, photos, tables, etc Raw data must ensure the following factors: accurate, nearest problem-solving needs of models, not mixed with incorrect data, and carefully labeled The accuracy and efficiency of the models are based on raw data quality.

2 Data preparation: In this step, raw data will be prepared including cleaning data, removing unnecessary features, removing errors, and formatting etc. Raw data after preparation can be said to training data Steps 1 and 2 usually account for more than 70% of the total implementation time.

3 Choosing and training the model: Choose suitable models and start the training process, according to tasks, projects, topics, etc.

4 Model optimization: During training, we can try several configurations, settings, and parameters to optimize the model’s ability which leads to enhance model’s accuracy.

5 Model evaluation: When the training is over, engineers will check model performance based on metrics such as Accuracy, Loss, Confusion Matrix, AUC (Area Under ROC curve), Mean Absolute Error (MAE), Root Mean Square Error (RMSE), and R Square The model whose accuracy reaches over 80% is considered good results Engineers can also use separate data that did not exist in the training data to test how the model can apply what it has learned in a generalized way.

6 Model deployment: After finishing training and evaluation, the model which reached low accuracy will be fixed and retrained The issue of these models may be on structure or training data (back to above steps to fix them) The models with allowed accuracy will be used to predict or identify patterns in new, unseen data.

There are 4 main types of machine learning:

Supervised machine learning is a type of machine learning that learns the relationships between input and output as known as input-to-output mappings. The input can be called features or “X variables’ and output can be called as the target or ‘Y variables’ The data type which includes both is known as labeled data [8].

Unsupervised machine learning is a type of machine learning that do not require input-to-output mappings to learn a mapping function When inputting the raw data (without repaired), it can explore the underlying structure of an unlabeled dataset [9].

Semi-supervised machine learning is a hybrid technique between supervised and unsupervised learning by using a small amount of labeled data and a large amount of unlabeled data as input to train a model, it generates the prediction result [10].

Reinforcement machine learning is a type of machine learning that allows a system to learn and improve the performance of a function through trial and error One of well-known samples of Reinforcement machine learning is the development of driverless cars [11].

One of the common Supervised machine learning algorithms is Decision Tree. Decision Tree is a flowchart tree like model in which each node denotes the feature, branches denote the rules, and the leaf nodes denote the results of algorithms [12].

There are 2 types of Decision Trees [14]:

- Classification trees: typically deal with "yes" or "no" questions.

- Regression trees: predict continuous values based on historical data.

And 4 popular types of Decision Tree algorithm based on approaching build the tree structure:

- ID3 (iterative Dichotomiser 3): Is is one of the earliest decision tree algorithms introduced by Ross Quinlan It’s designed for classification target variables and make decisions based gained concept of information.

- (C45: Basically, it is an extension of ID3, also introduced by Ross Quinlan It uses some concepts such as gain ratio and handling missing to solve some limitations of ID3 C4.5 can work for both categorical and numerical target variables.

- CART (Classification and Regression Trees): It is a versatile decision tree introduced by Breiman et al It can be used for both classification and regression tasks and also creates binary trees.

- CHAID (Chi-squared Automatic Interaction Detection): It is a decision tree designed to be suitable for exploring interactions between categorical predictors and the target variable.

Random Forest is a machine learning algorithm trademarked by Leo Breiman and Adele Cutler Random Forest used combined output of multiple Decision trees as its input to reach a single result [15]. ©)

Deep learning is a subfield of machine learning which is based on artificial neural networks (ANN) ANN uses layers of interconnected nodes called neurons that work together to process and learn from the input data [16].

In a completely connected Deep neural network, there is an input layer and one or more hidden layers connected in order Each neuron receives input from the previous

16 layer neurons or the input layer The output of one neuron becomes the input to other neurons in the next layer of the network, and this process continues until the final layer generates the output of the network The layers of the neural network transform the input data through a series of nonlinear transformations, allowing the network to learn complex representations of the input data [16].

Deep neural networks (DNN) is an ANN but much more complicated with multiple hidden layers between the input and output layers and aim to mimic the brain's information processing [17].

Figure 3.5.4 Artificial neural networks and Deep neural networks [18]

There are many commonly used deep learning algorithms, but in our research, we used several algorithms that were suitable for our data including:

Technical Frar€WOTKS - << s1 tk 91 1 91 1T HH ng TH HH ng 24

OpenWeather Ltd was founded in 2012 by Denn Ukolov and Olga Ukolova. Headquater of OpenWeather Ltd located in London, United Kingdom They provide three groups of products and projects, namely, OpenWeatherMap (Weather API), The Satellite Imagery service, and Machine learning service [33].

OpenWeatherMap is an online service, owned by OpenWeather Ltd, which supplies global weather data via API, including current weather data, forecasts, nowcasts and historical weather data.

To access OpenWeather, we type “https://openweathermap.org/” on the search bar.

Weather forecasts, nowcasts and history in a fast and elegant way

Feels like 5°C Overcast clouds Moderate breeze

After accessing the OpenWeather site, to collect data by OpenWeatherMap API we follow these steps (APPENDICES A):

Step 2: Get the API key.

Step 3: Apply API to Python Script to collect Weather data.

Sentinel Hub is a cloud-based service processing multi-spectral and multi-temporal petabytes of satellite data and, a product of the same name subsidiary of Sinergise. Sentinel Hub can achieve fully automatically, process real time and remote sensing data distribution and related EO product (EO Browser) Thanks to APIs provided, Users can retrieve satellite data over their area of interest (AOD), and specific time range from full archives in a matter of seconds [35].

Sinergise is a GIS IT company with more than 10 years of experience in working with spatial data whose headquarter is in Ljubljana, Slovenia and has a subsidiary called Sentinel Hub GmbH in Graz, Austria [37].

Sentinel Hub API is a RESTful API interface which provides access to various raw data of various satellite imagery archives [38] Currently, Sentinel Hub provides 7 APIs included in their services for s for a variety of functionalities which allow to search, process, analyze, visualize, and download satellite data, as well as integrate it into users’ applications [39].

3.6.2.2 Set up Sentinel Hub API

To access to Sentinel Hub, we type ‘https://www.sentinel-hub.com/’ on search bar. y

Figure 3.6.4 Sentinel Hub Main Site After accessing the Sentinel Hub site, to collect data by Sentinel Hub API we follow these steps (APPENDICES B):

Step 1: Create Sentinel Hub account.

Step 2: Get the OAuth clients key.

Step 3: Apply OAuth clients key to Python Script to collect Satellite image data.

In our thesis graduate, our data collection system is built in Python Figure 4.1.1 represents the overview of the Data Preparation Process, including collecting and preprocessing 2 datasets from 2 sources: OpenWeatherMap [34] and Sentinel Hub

( Dataset i i i éollection Data cleaning and storage

Learning and Intelligence Layer opaniventher `$ | `3 Labeling Training data jygqel Trained Model

Figure 4.1.1 The architecture of the rain detection dataset framework

As shown in Figure 4.1, our architecture separated into three layers: dataset collection, data cleaning and storage, and learning and intelligence.

- At the dataset collection layer, we have collected weather dataset from

OpenWeather and Sentinel Hub by python script included OpenWeather API and Sentinel Hub OAuth clients key.

- Atthe data cleaning and storage layer, we have organized each loop result satellite image dataset into a separate folder with a standardized format (titled by the timestamp of the time executed each loop), and the numerical dataset will be up to date in each loop These folders have been uploaded to a cloud data server.

- At the learning and intelligence layer, we train and evaluate our proposed model based on pipeline as follows:

28 e Download the dataset from cloud data server. e Prepare the experimental dataset by implementing preprocessing techniques. e Train and evaluate our proposed deep neural network model on the available experimental metrics.

4.2 Rain Detection model based on numerical dataset.

In our thesis graduate, we tried various models such as LSTM, RNN, Decision Tree, and CNN-MLP Based on dataset and model performance, we choose CNN-MLP for training numerical dataset for rain detection.

Convolutional Neural Network - Multi-Layer Perceptron (CNN- MLP) is a combined model that combines CNN and MLP It uses a convolutional layer to extract features from the data and then uses fully connected layers to process the extracted feature and make the final prediction Figure 4.2.1 presents the model Architecture of our CNN-MLP model [41].

— convid Ln+Relu| convid Ln+Relu Dense

Dense CNN Layer MLP Layer

Figure 4.2.1 CNN-MLP model Architecture

Our CNN-MLP model architecture contain following layers:

Input Matrix: The input matrix of the model is a feature vector with dimension size (10,1), was created after we analyzed and removed some unimportant features from the CSV file.

Layer Conv1D #1: The layer is used to perform 1-dimensional convolution calculations on the input data, which use 32 filters with kernel size of 5 and ReLU activation function This means, this layer will perform a sliding filter with kernel size is 5 through the parts of the input data and calculate the convolution between the filter and the corresponding part of the data to create

32 feature maps Each of these feature maps focuses on a specific type of input signal characteristic to increase the diversity of information represented during the model's learning process.

In the model, we also use the ReLU activation function to minimize the vanishing gradient phenomenon for the model.

Let y¡ be the output value at position i, x;„ ; be the input value at position i + 7, w; is the weight of the filter at position j, b is the adjustment coefficient (bias), k is the size of the filter, then y¡ is determined by: k-1 j=0

Besides, the output of the Conv1D class in our problem has the following size:

In the above formula, k is the filter size (kernel size), p is the size of the padding

1=6 used (if any), and s is the distance between the filter passes on the input.

Thus, after implementing this Conv1D layer, 32 feature maps will be created with a size of 6 data points (values in the feature maps can be real numbers), meaning the layer's output_shape of the layer is (6, 32)

Residual Block: The Residual Block contains 2 Conv1D layers and a shortcut connection that transfers information through layers.

We also use LayerNormalization to improve the stability and training dynamics of the ResidualBlock, and the ReLU activation function to minimize the vanishing gradient phenomenon for the model right back each Conv1D.

Layer Max_pooling1d: The layer is used to reduce the size of features map by selecting the maximum value form each sliding filter with pool_size = 2, input_shape = (6,32) so the output of Max_pooling1D is (3,32) calculated by following formula:

(input_size — pool_size) output_size = oor +1

2 Each element in the feature map determined by the formula:

In the above formula, i 1s the index of the feature map, j is the index of the element in feature map i The above formula says that, to calculate the value of element y; ; in the output, we will choose the maximum value of 2 consecutive elements in the input, begin with the indices 27 and 2j + 1.

Layer Flatten: After executing the CNN layers, the Flatten layer is used to perform the flattening of the output feature maps from the previous AveragePooling1D layer into a 1-dimensional vector to prepare for the fully connected layers later Thus, the layer's output is a 1-dimensional vector with length 3 x 32 = 96.

Ngày đăng: 22/10/2024, 23:53

Nguồn tham khảo

Tài liệu tham khảo Loại Chi tiết
[1] S. Chatterjee, B. Datta, S. Sen, N. Dey, and N. C. Debnath, “Rainfall prediction using hybrid neural network approach,” in 2018 2nd International Conference on Recent Advances in Signal Processing, Telecommunications &amp; Computing(SigTelCom), Jan. 2018, pp. 67—72. doi: 10.1109/SIGTELCOM.2018.8325807 Sách, tạp chí
Tiêu đề: Rainfall predictionusing hybrid neural network approach
[3] S. Aftab, M. Ahmad, N. Hameed, M. S. Bashir, I. Ali, and Z. Nawaz, “Rainfall Prediction using Data Mining Techniques: A Systematic Literature Review,” Int.J. Adv. Comput. Sci. Appl. ljacsa, vol. 9, no. 5, Art. no. 5, 31 2018, doi:10.14569/JACSA.2018.090518 Sách, tạp chí
Tiêu đề: RainfallPrediction using Data Mining Techniques: A Systematic Literature Review
[4] S. K. Mohapatra, A. Upadhyay, and C. Gola, “Rainfall prediction based on 100 years of meteorological data,” in 2017 International Conference on Computing andCommunication Technologies for Smart Nation (IC3TSN), Oct. 2017, pp. 162-166.doi: 10.1 109/IC3TSN.2017.8284469 Sách, tạp chí
Tiêu đề: Rainfall prediction based on 100years of meteorological data
[5] C. Z. Basha, N. Bhavana, P. Bhavya, and S. V, “Rainfall Prediction using Machine Learning &amp; Deep Learning Techniques,” in 2020 International Conference on Electronics and Sustainable Communication Systems (ICESC), Jul. 2020, pp. 92— Sách, tạp chí
Tiêu đề: Rainfall Prediction using MachineLearning & Deep Learning Techniques
[6] “Machine learning là gi? | TopDev.” Accessed: Jan. 03, 2024. [Online]. Available:https://topdev.vn/blog/machine-learning-la-gi/#machine-learning-la-gi Sách, tạp chí
Tiêu đề: Machine learning là gi? | TopDev
[7] K. Nikolopoulou, “What Is Machine Learning? | A Beginner’s Guide,” Scribbr.Accessed: Jan. 03, 2024. [Online]. Available: https://www.scribbr.com/ai- tools/machine-learning/ Sách, tạp chí
Tiêu đề: What Is Machine Learning? | A Beginner’s Guide
[8] “Supervised Machine Learning.” Accessed: Jan. 03, 2024. [Online]. Available:https://www.datacamp.com/blog/supervised-machine-learning Sách, tạp chí
Tiêu đề: Supervised Machine Learning
[9] “Introduction to Unsupervised Learning: Types, Applications and Differences from Supervised Learning | DataCamp.” Accessed: Jan. 03, 2024. [Online]. Available:https://www.datacamp.com/blog/introduction-to-unsupervised-learning Sách, tạp chí
Tiêu đề: Introduction to Unsupervised Learning: Types, Applications and Differences fromSupervised Learning | DataCamp
[10] “Semi-Supervised Learning in ML,” GeeksforGeeks. Accessed: Jan. 03, 2024.[Online]. Available: https://www.geeksforgeeks.org/ml-semi-supervised-learning/ Sách, tạp chí
Tiêu đề: Semi-Supervised Learning in ML
[11] G. Hutchison, “Four Types of Machine Learning Algorithms Explained,” Seldon.Accessed: Jan. 03, 2024. [Online]. Available: https://www.seldon.io/four-types-of-machine-learning-algorithms-explained Sách, tạp chí
Tiêu đề: Four Types of Machine Learning Algorithms Explained
[12] “Decision Tree,” GeeksforGeeks. Accessed: Jan. 03, 2024. [Online]. Available:https://www.geeksforgeeks.org/decision-tree/ Sách, tạp chí
Tiêu đề: Decision Tree
[13] “What is a Decision Tree | IBM.” Accessed: Jan. 03, 2024. [Online]. Available:https://(www.ibm.com/topics/decision-trees Sách, tạp chí
Tiêu đề: What is a Decision Tree | IBM
[14] S. LeSuer, “What is a Decision Tree (Parts, Types &amp; Algorithm Examples),”Slickplan. Accessed: Jan. 03, 2024. [Online]. Available:https://slickplan.com/blog/what-is-a-decision-tree Sách, tạp chí
Tiêu đề: What is a Decision Tree (Parts, Types & Algorithm Examples)
[15] “What is Random Forest? | IBM.” Accessed: Jan. 03, 2024. [Online]. Available:https://www.ibm.com/topics/random-forest Sách, tạp chí
Tiêu đề: What is Random Forest? | IBM
[16] “Introduction to Deep Learning,” GeeksforGeeks. Accessed: Jan. 03, 2024.[Online]. Available: https://www.geeksforgeeks.org/introduction-deep-learning/ Sách, tạp chí
Tiêu đề: Introduction to Deep Learning
[17] “Deep Neural Network - an overview | ScienceDirect Topics.” Accessed: Jan. 03, 2024. [Online]. Available: https://www.sciencedirect.com/topics/chemical-engineering/deep-neural-network Sách, tạp chí
Tiêu đề: Deep Neural Network - an overview | ScienceDirect Topics
[18] “Introduction to Deep Neural Networks.” Accessed: Jan. 03, 2024. [Online].Available: https://www.datacamp.com/tutorial/introduction-to-deep-neural-networks Sách, tạp chí
Tiêu đề: Introduction to Deep Neural Networks
[19] “Multilayer Perceptron - an overview | ScienceDirect Topics.” Accessed: Jan. 03, 2024. [Online]. Available: https://www.sciencedirect.com/topics/computer-science/multilayer-perceptron Sách, tạp chí
Tiêu đề: Multilayer Perceptron - an overview | ScienceDirect Topics
[20] “Top 10 Deep Learning Algorithms You Should Know in 2023,”Simplilearn.com. Accessed: Jan. 03, 2024. [Online]. Available:https://www.simplilearn.com/tutorials/deep-learning-tutorial/deep-learning-algorithm Sách, tạp chí
Tiêu đề: Top 10 Deep Learning Algorithms You Should Know in 2023
[21] “What is Multilayer Perceptron (MLP) Neural Networks? - Shiksha Online.”Accessed: Jan. 03, 2024. [Online]. Available: https://www.shiksha.com/online-courses/articles/understanding-multilayer-perceptron-mlp-neural-networks/ Sách, tạp chí
Tiêu đề: What is Multilayer Perceptron (MLP) Neural Networks? - Shiksha Online

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN