1. Trang chủ
  2. » Luận Văn - Báo Cáo

Automated parking space detection using convolutional neural networks

48 1 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Automated Parking Space Detection Using Convolutional Neural Networks
Tác giả Nguyen Tien Anh
Người hướng dẫn Phd. Kim Dinh Thai
Trường học Vietnam National University, Hanoi International School
Chuyên ngành Informatics and Computer Engineering
Thể loại Graduation Project
Năm xuất bản 2024
Thành phố Hanoi
Định dạng
Số trang 48
Dung lượng 2,93 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Cấu trúc

  • 1.1: I NTRODUCTION (9)
  • CHAPTER 2: METHODOLOGY 9 2.1: I NTRODUCTION TO DEEP LEARNING (12)
    • 2.1.1: C ONVOLUTIONAL N EURAL N ETWORK (CNN) (12)
    • 2.2: A BOUT YOLO V 8 (17)
      • 2.2.1: YOLO V 8 STRUCTURE (17)
      • 2.2.2: YOLO V 8 NECK STRUCTURE (20)
      • 2.2.3: H EAD STRUCTURE OF YOLO V 8 (21)
        • 2.2.3.1: Detection layer (21)
        • 2.2.3.2: Loss algorithm (22)
      • 2.2.4: M ODULE C2 F AND CBS BLOCK (23)
        • 2.2.4.1: Module Cross-Branch Summation (C2f) (23)
        • 2.2.4.2: CBS Block (25)
  • CHAPTER 3: PROPOSED METHOD AND IMPLEMENTATION 29 3.1: O VERALL VIEW OF SYSTEM (32)
    • 3.2: D ATASET COLLECTION (34)
    • 3.3: E VALUATION METRICS (36)
    • 3.4: T RANING MODEL (37)
  • CHAPTER 4: EXPERIMENT AND EVALUATION 36 4.1. T RANING RESULTS (39)
    • 4.2. E VALUATION ON REAL - TIME APPLICATION (0)
    • 4.3: R EALTIME GUIDANCE SYSTEM PERFORMANCE (43)

Nội dung

Automated parking space detection using convolutional neural networks Automated parking space detection using convolutional neural networks

I NTRODUCTION

With the rapid urbanization and increasing number of vehicles, cities worldwide are facing significant challenges in managing parking spaces efficiently Traditional parking systems, characterized by static signage, manual monitoring, and conventional pay-and-display methods, are increasingly inadequate in addressing the escalating demands of urban mobility These conventional systems suffer from several critical disadvantages that contribute to a host of problems for drivers, urban planners, and the environment

One of the primary drawbacks of traditional parking systems is the inefficiency in space utilization Without real-time information on parking availability, drivers often spend excessive amounts of time circling around in search of a vacant spot This not only leads to driver frustration and wasted time but also exacerbates traffic congestion Studies indicate that up to 30% of urban traffic congestion is caused by vehicles searching for parking [1] The indirect effects include increased fuel consumption and elevated levels of greenhouse gas emissions, contributing to environmental degradation

Furthermore, manual parking management systems are labor-intensive and prone to human error

[2] Parking attendants can only monitor a limited number of spaces at a time, and their assessments may not always be accurate or timely This limitation results in inefficient enforcement of parking regulations, leading to unauthorized parking, underutilized spaces, and reduced revenue for parking facility operators In addition, static signage and traditional payment methods offer little flexibility and adaptability, failing to meet the dynamic needs of urban parking

In developed countries, having parking spaces has become an advantage for any business activity in city centers Moreover, having a professional parking service is a strength for residential areas, shopping centers, office complexes, entertainment areas, and food courts to compete with their rivals Therefore, to maintain a business advantage and attract customers to use their services and purchase their products, providing sufficient parking spaces and professional parking services will always be among the important factors Countries like Germany, Sweden, and the USA have built smart parking lots and smart car manufacturing plants to meet people's needs

With the motto "Customer is king," Volkswagen Group in Wolfsburg, Germany, has built the intelligent "Autostadt CarTowers" to serve tourism, entertainment, and the customization of a car with personal colors [3] The special Car Tower parking lot in Volkswagen's factory park is 20 stories high, with a capacity of 800 cars The two towers look like modern office buildings from

7 the outside, but they are actually two car parking towers of the giant German car manufacturer Volkswagen, with glass walls The car tower is located within the Volkswagen Autostadt theme park, which includes the factory and offices of Volkswagen and its subsidiaries This smart parking lot is 20 stories high, with a capacity of 400 cars per tower After customers choose a car, robots will bring the car down from the giant "beehive." The tower also has a high-tech computer room that allows customers to customize a car with available features The printed design of the car can be received at the reception room

In Gothenburg, the second-largest city and the largest port of the Kingdom of Sweden, a smart parking lot called Parkeringsbat has been built to attract tourists, avoid waste, and protect the marine environment Sweden has repurposed an old barge into a floating parking lot by the harbor in Gothenburg [4] The parking lot can accommodate 400 cars and can be towed to different locations to become a mobile parking lot if there is unexpected demand Here, customers can park their cars and enjoy the beautiful sea view of the city

The USA is a leading country in information technology and telecommunications equipment Tesla Inc (formerly Tesla Motors, Inc.) is an American company specializing in designing, manufacturing, and distributing electric cars and components for electric vehicles The company uses "image recognition" technology on roads to enable self-driving cars, known as "Autopilot"

[5] Autopilot is an assistance program that allows cars to operate autonomously instead of the driver when the driver shows signs of fatigue or poor health This system was installed in all Tesla cars at the end of September 2014 The program controlling the car includes a computer mounted at the front, a front radar (provided by Bosch) located under the car's nose, sonar sensors, and cameras on the sides of the car at the front and back to create a 360-degree safety zone around the car This system allows the car to see traffic signs, road markings, obstacles, and other vehicles Additionally, Autopilot adjusts speed according to the vehicle in front and alerts the driver if the car strays from its lane Purchasing the Autopilot system license for $2,500 allows the car to drive itself over very short distances (called "Summon") and park itself (called "Auto Park") The current version 8 of the Autopilot program uses radar as the primary sensor instead of cameras

The advent of smart technologies offers a promising solution to these pervasive issues By integrating advanced technologies such as the Internet of Things (IoT), computer vision, and machine learning, a smart car parking system can significantly enhance the efficiency and effectiveness of urban parking management This research focuses on leveraging deep learning

8 techniques to develop an intelligent parking system that addresses the limitations of traditional methods

Deep learning, a subset of artificial intelligence, has shown remarkable success in various applications, particularly in image and video analysis In the context of smart parking, deep learning algorithms can be employed to process data from IoT sensors and surveillance cameras, providing accurate, real-time information about parking space occupancy [6] These algorithms can detect vehicles, recognize parking patterns, and predict space availability with high precision

By automating the detection and monitoring processes, deep learning can eliminate the need for manual inspections and reduce human error

Moreover, a smart parking system powered by deep learning can offer several additional benefits

It can provide real-time updates to drivers through mobile applications, guiding them to available spaces and reducing the time spent searching for parking This capability not only enhances driver convenience but also mitigates traffic congestion and lowers emissions The system can also dynamically adjust parking prices based on demand, optimizing space utilization and maximizing revenue for parking facility operators

The primary focus of this research is to explore the development and implementation of a deep learning-based smart car parking system We aim to design a comprehensive solution that integrates IoT sensors, computer vision, and machine learning algorithms to create a robust and efficient parking management system Our study will involve the creation of a large-scale dataset of parking scenarios, the development of advanced detection and prediction models, and the evaluation of the system's performance in real-world settings By leveraging the YOLOv8 architecture, which utilizes the PAN-FPN structure for feature extraction and separate branches for classification and regression, this research seeks to build a system that not only enhances detection accuracy but also operates efficiently in various urban environments

METHODOLOGY 9 2.1: I NTRODUCTION TO DEEP LEARNING

C ONVOLUTIONAL N EURAL N ETWORK (CNN)

Convolutional Neural Networks (CNNs) [7] are a type of Deep Learning model CNNs are widely used in tasks involving object recognition in images To enable computers to recognize objects in images, we use a type of artificial neural network (ANN) Convolutional Neural Networks (CNNs) are based on an important mathematical operation used in these networks called convolution Convolutional Neural Networks are inspired by the human brain, involving various activities and processes In the 1950s and 1960s, D.H Hubel and T.N Wiesel conducted research on the brains of animals and proposed a new model for how animals perceive the world around them [8] In their report, the two researchers described two types of neurons in the brain, which function differently: simple cells (S cells) and complex cells (C cells) Simple cells are activated when recognizing simple shapes like lines within a fixed area and their edges Complex cells have larger receptive fields, and their output is not sensitive to specific locations within the field In vision, a neuron's receptive field corresponds to an area on the retina where the neuron will be activated

In 1980, Fukushima proposed a hierarchical neural network model called Noncognition This model was based on the concepts of S cells and C cells The Noncognition network can recognize patterns based on learning the shapes of objects [9] Then, in 1998, Convolutional Neural Networks were introduced by Bengio, Le Cun, Bottou, and Haffner The first model representing CNNs was called LeNet-5 [10] This model could recognize handwritten digits Convolutional Neural Network (CNN) Structures

A Convolutional Neural Network (CNN), also known as ConvNet, is a type of Artificial Neural Network (ANN) consisting of a set of convolutional layers stacked on top of each other, utilizing nonlinear activation functions such as ReLU and tanh to activate the weights in the nodes At each node, after passing through the activation functions, more abstract information is generated for the subsequent layers [11]

A CNN model comprises a set of processing layers that can learn different features of the input data (e.g., images) at various levels of abstraction The summarizing layers learn and extract high- level features, while deeper layers learn and extract low-level features The basic conceptual model of CNN is shown in Figure

Figure 1 Overview of the CNN model [12]

Some key components in the CNN architecture:

Convolutional Layer: This is the most crucial component of CNN It contains a set of convolutional kernels, also known as filters, integrated with the input image to create output feature maps Each kernel is a grid of discrete values, where each value is called a weight At the start of training a CNN model, all weights of a kernel are assigned random values During training, these weights are adjusted to extract meaningful features

In CNN, the input is a multi-channel image: for RGB images, there are 3 channels; for grayscale images, there is one channel After applying filters to regions of the image matrix, the filter maps are three-dimensional matrices containing parameters called parameters Convolution operations shift the filter map by each pixel horizontally and vertically, multiplying corresponding values and summing them to create a proportional value in the output feature map, placed in the center

Padding: Padding involves adding border values around the image matrix with zeroes to ensure that the convolution operations do not reduce the original image size

Feature Map: Represents the result after each feature map scan of the input image matrix After each scan, the Convolutional kernel performs calculations [13]

The main advantages of convolutional layers are:

Sparse Connectivity: In a neural network, each neuron in one layer connects to every neuron in the next layer However, in CNN, the number of connections or weights required is small, reducing the memory needed to store these weights

Weight Sharing: In CNN, there are no unique weights between two neurons in adjacent layers; instead, all weights operate on each pixel of the input matrix This allows for a significant reduction in training time and other costs

ReLU Layer: This layer is an activation function in CNN, simulating neurons with a high spike transmission rate Other activation functions like Leaky ReLU, Sigmoid, and Maxout are also used, with ReLU being the most common ReLU helps accelerate calculations during network training When using ReLU, one must adjust learning rates and address dead units ReLU layers are applied after the filter map is calculated and applied to the filter map values [14]

Pooling Layer: Created after convolutional operations, this layer reduces the size of feature maps while preserving the most prominent features The primary limitation of pooling layers is that they decrease CNN performance by focusing on the presence of a characteristic feature rather than its exact position

Fully Connected Layer (FC): In these layers, each neuron in one layer connects to every neuron in the previous layer The last FC layer serves as the output layer (classifier) of the CNN architecture These layers follow the principles of traditional multilayer perceptron (MLP) networks, taking input from the final convolutional or pooling layer and flattening it into a vector, which is then passed to the FC layer to generate the final CNN output

Sliding Connectivity: Unlike regular neural networks, CNN does not connect to the entire image but only to local regions equal to the filter size Filters slide across the image horizontally and vertically, computing convolution values and filling the activation map [15]

3D Neuron Blocks: CNN layers form 3D blocks instead of 2D matrices, organized by width, height, and depth The width and height of a layer depend on the filter size, previous layer dimensions, padding, and stride Depth equals the number of units in that layer Applying filters to different regions results in 2D matrices stacked to form 3D neuron blocks [15]

Figure 6 3D Neuron Block Structure in AlexNet [16]

Shared and Local Connectivity: Transformation processes in CNN connect 3D neuron blocks, with units connecting to local regions sized to the filter These regions share a set of hyperparameters called the receptive field Local connectivity applies only to width and height, extending fully in depth Each filter extracts specific features across all local regions, appearing in the new layer [15]

Pooling: Reduces width and height while preserving key features, using methods like average pooling or max pooling to control parameter growth and computational cost [15]

Increasing Detection Complexity: Initially, CNN receives pixel values Subsequent layers detect edges, and more complex patterns or objects The final layer outputs probabilities for each class

Convolution Layers: More layers improve performance, typically stabilizing after 3-4 layers Filter Size: Commonly 3x3 or 5x5 matrices

Pooling Size: Typically, 2x2 for regular images, 3x3 for larger images

Train-Test: Repeated train-test cycles yield optimal parameters

A BOUT YOLO V 8

The YOLOv8 architecture builds upon previous versions in the YOLO family, featuring a convolutional neural network divided into two main components: the backbone and the head The backbone, based on a modified CSPDarknet53 architecture, efficiently extracts features from the input images to create a feature map This map is then utilized by the head to predict bounding boxes and class probabilities Notably, the backbone prioritizes a lightweight design without sacrificing accuracy and integrates partial cross-layer connections to enhance information flow through the layers The head, composed of convolutional layers, further refines the predictions based on the feature map YOLOv8 offers improvements in object detection accuracy, integrating anchor-free methods while maintaining rapid inference speed It provides versatility with support for various backbones, including EfficientNet, ResNet, and CSPDarknet, allowing users to tailor their models Adaptive training strategies optimize learning rates and loss functions for better performance, along with advanced data augmentation techniques like MixUp and CutMix The high customizability of this architecture allows users to tweak the structure and parameters, while pretrained models facilitate easy transfer learning across diverse datasets

YOLOv8 uses an improved CSPDarknet53 backbone, as shown in Figure 2a The input sample is reduced in size five times to obtain five different scale samples, which helps YOLOv8 apply gradient connections to enrich the information flow of the feature extraction network while maintaining a lightweight design The CBS module performs a convolution operation on the input information, followed by batch normalization, and finally activates the information flow using SiLU to obtain the output, as illustrated in Figure 2g At the end of the backbone network, the Spatial Pyramid Pooling Fast (SPPF) module is used to generate input sample maps into a fixed map with adjustable output size Compared to the spatial pyramid pooling (SPP) structure, SPPF reduces computational effort and has lower latency by sequentially connecting three max-pooling layers, as shown in Figure 2d

The algorithm process is as follows: Firstly, the image is divided into S × S grid cells Each grid cell is responsible for predicting the object whose bounding box center falls within the grid cell

A total of S × S × B anchor boxes are generated from these grid cells Each anchor box contains five parameters: the target center coordinates, the width and height dimensions of the target (x, y, w, h), and the confidence that the target is contained within the anchor box The S × S grid cells predict the probability of the target type in that grid cell The confidence of the predicted anchor box and the target type probability are then multiplied to obtain the class score for each predicted anchor box These predicted anchor boxes are filtered using the non-maximum suppression (NMS) mechanism to obtain the final prediction results

1) In this study, the input images are of personal protective equipment (PPE), and the process involves three parts: data augmentation, image resizing, and automatic anchor box adjustment YOLOv8 uses mosaic data augmentation to automatically resize, crop, arrange, and then merge inputs to improve the detection of small targets During dataset training, the input image size is changed to a uniform size and then fed into the model for testing

2) The backbone consists of 9 C2f layers and CBS modules, with each pair of C2f and CBS reducing the size from 640×640×3 to 20×20×512×w×r The first CBS module has a single input compared to other inputs, and the final C2f layer, instead of CBS, will combine with a Spatial Pyramid Pooling Fusion (SPPF) module

3) The neck is a network layer that combines image features and transfers them to the prediction layer YOLOv8’s neck uses the FPN+PAN structure

4) The prediction layer forecasts the image features and generates bounding boxes to predict the target type YOLOv8 uses BCE_Loss for classification loss Distribution Focused Loss (DFL) and Complete Intersection over Union (CIoU) are used for bounding box loss

By using the CSPDarknet53 network like YOLOv5, YOLOv8 inherits the network architecture of the YOLO algorithm The input size of the network is 640×640×3, which is a three-channel RGB color image with a length and width of 640 after preprocessing the original image When passing through the network, the output size of the detection layer for large, medium, and small sizes is là

𝑆 × 𝑆 × 𝑛 𝑎 × (𝑡 𝑥 + 𝑡 𝑦 + 𝑡 𝑤 + 𝑡 ℎ + 𝑡 𝑜 + 𝑛 𝑐 ) 𝑆 × 𝑆 is the number of grid divisions 𝑛 𝑐 is the number of grid divisions.𝑡 𝑥 , 𝑡 𝑤 , 𝑎𝑛𝑑 𝑡 ℎ are parameters related to the bounding box 𝑡 𝑜 is the confidence of the bounding box 𝑡 𝑐𝑖 is the confidence of category i These parameters need to be decoded according to Formula (1) to obtain the final predicted box

Inspired by PANet, YOLOv8 is designed with a PAN-FPN (Feature Pyramid Network-Path Aggregation Network) structure at the gateway However, by comparing the architectural diagrams of YOLOv5, we can see that YOLOv8 removes the convolutional structure in the sampling stage of the PAN-FPN in YOLOv5 and replaces the C3 module with the C2f module This change allows YOLOv8 to perform the detection process faster while maintaining the original performance and achieving a lightweight model

(1) To shorten the information path and enhance the feature pyramid with accurate localization signals available at lower levels, the bottom-up path augmentation feature is created However, the

Figure 9 Feature Pyramid Network-Path Aggregation Network

18 propagation of low-level features to enhance the entire feature hierarchy for individual recognition has not yet been explored

(2) PAN restores the broken information path between each proposal and all feature levels This is a simple component to aggregate features from all feature levels for each proposal, avoiding arbitrarily assigned results With this operation, cleaner paths are created Finally, to capture different perspectives of each proposal, enhance mask prediction with small fully connected layers, which have additional attributes for the original FCN used by Mask R-CNN

The basic idea of FPN is to take multi-scale feature maps generated by different layers of the CNN and create a pyramid of feature maps with different spatial resolutions This allows the network to capture and utilize features at multiple scales Starting with the highest resolution feature map (from the later layers in the CNN), FPN samples it to create feature pyramids at different scales Then, objects from lower layers are merged with upsampled objects to create a set of feature maps at different scales

On the other hand, PAN enhances FPN by introducing a "bottom-up" path alongside the "top- down" path used by FPN The bottom-up path aggregates and refines features from the lower layers, improving the handling of small objects Additionally, the top-down path focuses on high- level semantics

In contrast, both FPN and PAN are feature extractors used in object detection tasks FPN is a feature extractor that takes an image of arbitrary size as input and outputs corresponding feature maps at multiple levels It is used to extract features at different scales from an image On the other hand, PAN is a top-down architecture with lateral connections used to build high-level semantic feature maps at every scale It is used to extract semantically rich features with precise localization information

The Detection Layer of YOLOv8 uses a decoupled head structure, as shown in Figure 1e The decoupled head structure utilizes two separate branches for object classification and regression, predicting bounding boxes, and different loss functions are used for these two types of tasks For the classification task, Binary-Cross Entropy loss (BCE loss) is used For the bounding box regression task, the Distribution Focal Loss (DFL) [17] and CIoU [17] are used This detection structure can improve detection accuracy and speed up the model’s prediction, and it also transitions from an anchor-based to an anchor-free format The IoU loss focuses on the structural

PROPOSED METHOD AND IMPLEMENTATION 29 3.1: O VERALL VIEW OF SYSTEM

D ATASET COLLECTION

For the dataset collection process of this thesis, I utilized the publicly available PKLot [21]dataset, which is specifically designed for car parking lot detection and classification This dataset is extensive and well-suited for training deep learning models in the context of smart parking systems

The PKLot dataset is a large-scale image dataset that contains labeled images of parking lots, with various weather conditions and lighting scenarios It includes images captured from different angles and perspectives, making it highly valuable for developing robust parking detection algorithms The dataset consists of three main parking lots located in Brazil:

 PUCPR: Pontifical Catholic University of Paraná

 UFPR04: Federal University of Paraná

 UFPR05: Federal University of Paraná

Each parking lot in the dataset includes a wide range of images taken at different times of the day, under various weather conditions such as sunny, cloudy, and rainy This diversity ensures that models trained on this dataset can generalize well to different real-world conditions

The images in the PKLot dataset are annotated with bounding boxes that mark the location of each parked vehicle These annotations are crucial for training object detection models such as

YOLOv8 The dataset provides annotations in a standard format that includes:

 The coordinates of the bounding box around each vehicle

 The class label, indicating whether the space is occupied or empty

To prepare the PKLot dataset for training, I performed several preprocessing steps:

Data Augmentation: I applied various data augmentation techniques to increase the diversity of the training data This included random cropping, horizontal and vertical flipping, rotation, and color adjustments

Image Resizing: All images were resized to a uniform size of 640x640 pixels to match the input requirements of the YOLOv8 model

Normalization: The pixel values of the images were normalized to a range of [0, 1] to ensure consistent input data for the neural network

Label Conversion: The annotations were converted into the format required by the YOLOv8 model, ensuring that the bounding box coordinates and class labels were correctly formatted

The PKLot dataset was split into training, validation, and testing sets to ensure robust model evaluation The splits were made as follows:

 Training Set: 70% of the dataset, used for training the YOLOv8 model

 Validation Set: 15% of the dataset, used for hyperparameter tuning and model validation during training

 Testing Set: 15% of the dataset, used for evaluating the final performance of the trained model

By leveraging the PKLot dataset, this research aims to develop a deep learning-based smart parking system that can accurately detect and classify parking space occupancy under various real-world

33 conditions The extensive and diverse nature of the PKLot dataset provides a solid foundation for training and evaluating the YOLOv8 model, ensuring high detection accuracy and robust performance in urban parking environments.

E VALUATION METRICS

To assess the detection capabilities of our model, we employed a set of evaluation metrics These metrics included precision, recall, mean average precision (mAP) at IoU thresholds of 0.5 and 0.5:0.95, the number of model parameters, model size, and detection speed In the context of these metrics, we utilized specific parameters, such as True Positives (TP), False Positives (FP), and False Negatives (FN), to gauge the model's performance

Precision (P), a key metric, quantifies the ratio of positive samples correctly predicted by the model to all detected samples It is calculated as follows:

𝑇𝑃 + 𝐹𝑃 Recall (R), that another essential metric, expresses the ratio of positive samples correctly predicted by the model to the total positive samples in existence:

𝑇𝑃 + 𝐹𝑁 Average Precision (AP) is a measure derived from the area under the precision-recall curve, signifying the model's precision-recall performance:

Mean Average Precision (mAP) is an aggregate metric that involves taking the weighted average of AP values across all sample categories It provides a comprehensive evaluation of the model's performance across all categories, and it is calculated as follows:

Here, APi refers to the AP value for category index i, and N represents the total number of categories in the training dataset (in our case, N is 10) Additionally, we computed mAP0.5 and mAP0.5:0.95, which denote the average accuracy at different IoU thresholds, with the latter spanning a range from 0.5 to 0.95 at 0.05 intervals

T RANING MODEL

The training process for the YOLOv8 model involves several key steps to ensure accurate and efficient detection of vehicles in parking lots This section outlines the methodology and parameters used to train the model using the PKLot dataset

I began by initializing the YOLOv8 model with pre-trained weights on the COCO dataset This pre-training provides a solid foundation for the model, leveraging features learned from a large and diverse dataset of objects, which helps accelerate convergence and improve performance on the target task

The training configuration involves setting various hyperparameters that control the training process Key parameters include:

 Batch Size: I set the batch size to 16, balancing between memory constraints and the ability to generalize from multiple samples in each iteration

 Learning Rate: The initial learning rate was set to 0.001, with a learning rate scheduler that reduces the learning rate by a factor of 10 if the validation loss plateaued for 10 epochs

 Epochs: The model was trained for 100 epochs, allowing sufficient time for the model to learn from the data while monitoring for overfitting

 Optimizer: I used the Adam optimizer, known for its efficiency and adaptability in training deep learning models

To improve the robustness of the model, I applied extensive data augmentation techniques during training These techniques included:

 Random Cropping: Randomly cropping sections of the image to simulate partial occlusion and improve detection robustness

 Horizontal and Vertical Flipping: Flipping images to augment the dataset with mirrored versions of the scenes

 Rotation: Rotating images by random angles to account for different camera angles in real- world scenarios

 Color Jittering: Adjusting the brightness, contrast, and saturation of the images to enhance the model's ability to handle varying lighting conditions

The training process involved the following steps:

 Data Loading: The PKLot dataset images and their corresponding annotations were loaded and preprocessed in batches

 Forward Pass: For each batch, the images were passed through the YOLOv8 model to obtain predictions for bounding boxes and class probabilities

 Loss Calculation: The loss was computed using a combination of Binary Cross-Entropy (BCE) loss for classification and Complete Intersection over Union (CIoU) loss for bounding box regression The Distribution Focal Loss (DFL) was also incorporated to enhance the detection of object boundaries

 Backward Pass and Optimization: The gradients of the loss with respect to the model parameters were computed through backpropagation The optimizer then updated the model parameters to minimize the loss

 Validation: After each epoch, the model's performance was evaluated on the validation set Metrics such as precision, recall, and mean Average Precision (mAP) were tracked to monitor progress and guide hyperparameter adjustments

After the initial training phase, I performed fine-tuning to further enhance the model's performance This involved unfreezing additional layers of the model and training with a lower learning rate, allowing the model to fine-tune its parameters based on the specific characteristics of the PKLot dataset

EXPERIMENT AND EVALUATION 36 4.1 T RANING RESULTS

R EALTIME GUIDANCE SYSTEM PERFORMANCE

The images illustrate the guidance system developed in this thesis, utilizing tracking and Complete Intersection over Union (CIoU) techniques This system is designed to provide real-time guidance to vehicles entering a parking lot, directing them to the nearest available parking space

In the top image, the guidance system is seen in action with vehicles being tracked as they navigate through the parking lot Each vehicle is assigned a unique ID, which is continuously tracked as it moves When a car approaches an entrance marked by a yellow bounding box, the CIoU algorithm calculates if the vehicle is within this designated entrance area Once detected, the system identifies and displays the nearest available parking slot's ID, guiding the vehicle to its destination The real- time updates are crucial for efficient parking management, reducing the time drivers spend searching for parking spaces and minimizing congestion

Figure 28 Realtime testing result of YOLOv8l in dense background.

The bottom image shows a continuation of the system's operation, further highlighting the effectiveness of the tracking and CIoU integration The system accurately maintains the unique IDs for each vehicle and dynamically updates the nearest available parking slot information as the vehicles move The green bounding boxes indicate free parking spaces, while red boxes denote occupied slots The use of CIoU ensures precise detection of vehicles at the entrance points, enhancing the system's reliability in providing accurate guidance

Overall, these images demonstrate the practical application of the guidance system in a real-world scenario, showcasing its potential to streamline parking operations By efficiently directing vehicles to the nearest available spaces, the system improves overall parking lot utilization and enhances the driver experience, ultimately contributing to better traffic management and reduced emissions from vehicles searching for parking

Figure 29 Realtime testing result of YOLOv8l with guidance system.

This thesis presents the development and implementation of a smart parking system using the YOLOv8 model, demonstrating its effectiveness in real-time vehicle detection and parking space management By leveraging deep learning techniques and the advanced architecture of YOLOv8, the proposed system provides accurate, efficient, and real-time updates on parking space availability The system's capability to handle diverse parking scenarios, as evidenced by high precision and recall metrics, ensures reliable performance across different environments and conditions

The evaluation of the YOLOv8 model on the PKLot dataset showed promising results, with high mean average precision (mAP) scores indicating robust detection accuracy The model's performance in real-time video feeds and aerial views further validated its applicability for large- scale parking management systems The integration of tracking and CIoU algorithms in the guidance system demonstrated the model's potential to streamline parking operations, reducing search times for drivers and enhancing overall traffic flow within parking facilities

The results of this research highlight the potential of deep learning models, particularly YOLOv8, in addressing the challenges of urban parking management The model's high accuracy and real- time processing capabilities make it a suitable choice for modern smart parking systems However, the study also identified areas for improvement, such as the need for better differentiation between similar classes (e.g., empty and free parking spots), which could be addressed through more refined training data and advanced feature extraction techniques

The practical application of the guidance system in real-world scenarios underscored the importance of accurate vehicle tracking and dynamic space allocation The use of unique vehicle IDs and continuous tracking ensures that drivers receive timely and precise guidance to available parking spaces This not only improves user experience but also optimizes space utilization and reduces congestion within parking lots

Future research could focus on enhancing the system's robustness by incorporating additional sensors and data sources, such as GPS and real-time traffic information Moreover, expanding the dataset to include more varied parking scenarios and environmental conditions would further improve the model's generalization capabilities Implementing advanced techniques like

43 reinforcement learning could also optimize the system's decision-making process, leading to even more efficient parking management solutions

[1] “Parking search caused congestion: Where’s all the fuss?,” Transportation Research Part

C: Emerging Technologies, vol 120, p 102781, Nov 2020, doi: 10.1016/j.trc.2020.102781

[2] M Rashid, A Musa, A Rahman, M S Nur Farahana, and A Farhana, “Automatic Parking Management System and Parking Fee Collection Based on Number Plate Recognition,”

M M Rashid, A Musa, M Ataur Rahman, and N Farahana, A Farhana, vol 2, pp 93–98, Jan

[3] “Cars, Aesthetics and Urban Development | Knowledge, Technology & Policy.”

Accessed: Jun 16, 2024 [Online] Available: https://link.springer.com/article/10.1007/s12130- 008-9053-9

[4] O.-M Jandl, “Implementing Inland Waterway Transportation in Urban Logistics,” 2016, Accessed: Jun 16, 2024 [Online] Available: https://hdl.handle.net/20.500.12380/244200

[5] M Dikmen and C M Burns, “Autonomous Driving in the Real World: Experiences with Tesla Autopilot and Summon,” in Proceedings of the 8th International Conference on

Automotive User Interfaces and Interactive Vehicular Applications, in Automotive’UI 16 New

York, NY, USA: Association for Computing Machinery, Oct 2016, pp 225–228 doi:

[6] F Piccialli, F Giampaolo, E Prezioso, D Crisci, and S Cuomo, “Predictive Analytics for Smart Parking: A Deep Learning Approach in Forecasting of IoT Data,” ACM Trans Internet

Technol., vol 21, no 3, p 68:1-68:21, Jun 2021, doi: 10.1145/3412842

[7] D Bhatt et al., “CNN Variants for Computer Vision: History, Architecture, Application, Challenges and Future Scope,” Electronics, vol 10, p 2470, Oct 2021, doi:

[8] D H H M.D and T N W M.D, Brain and Visual Perception: The Story of a 25-Year

[9] G S Himes and R M Inigo, “Automatic target recognition using a neocognitron,” IEEE

Transactions on Knowledge and Data Engineering, vol 4, no 2, pp 167–172, Apr 1992, doi:

[10] Y LeCun, L Bottou, Y Bengio, and P Ha, “Gradient-Based Learning Applied to

[11] A Bas, M Topal, C Duman, and I van Heerden, “A Brief History of Deep Learning- Based Text Generation,” Dec 2022, pp 1–4 doi: 10.1109/ICCA56443.2022.10039545

[12] “Fig 4: The LeNet architecture for handwritten and machineprinted ,” ResearchGate Accessed: Jun 16, 2024 [Online] Available: https://www.researchgate.net/figure/The-LeNet- architecture-for-handwritten-and-machineprinted-character-recognition-It_fig4_368518997

[13] M.-C Su and H.-T Chang, “Fast self-organizing feature map algorithm,” IEEE

Transactions on Neural Networks, vol 11, no 3, pp 721–733, May 2000, doi:

[14] A F Agarap, “Deep Learning using Rectified Linear Units (ReLU).” arXiv, Feb 07,

[15] K He, X Zhang, S Ren, and J Sun, “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition,” CoRR, vol abs/1406.4729, 2014, Accessed: Oct 31, 2023 [Online] Available: http://arxiv.org/abs/1406.4729

[16] S Liu, L Qi, H Qin, J Shi, and J Jia, “Path Aggregation Network for Instance

Segmentation,” CoRR, vol abs/1803.01534, 2018, Accessed: Oct 31, 2023 [Online] Available: http://arxiv.org/abs/1803.01534

Ngày đăng: 21/11/2024, 21:20

Nguồn tham khảo

Tài liệu tham khảo Loại Chi tiết
[1] “Parking search caused congestion: Where’s all the fuss?,” Transportation Research Part C: Emerging Technologies, vol. 120, p. 102781, Nov. 2020, doi: 10.1016/j.trc.2020.102781 Sách, tạp chí
Tiêu đề: Parking search caused congestion: Where’s all the fuss?,” "Transportation Research Part C: Emerging Technologies
[2] M. Rashid, A. Musa, A. Rahman, M. S. Nur Farahana, and A. Farhana, “Automatic Parking Management System and Parking Fee Collection Based on Number Plate Recognition,”M. M. Rashid, A. Musa, M. Ataur Rahman, and N. Farahana, A. Farhana, vol. 2, pp. 93–98, Jan Sách, tạp chí
Tiêu đề: Automatic Parking Management System and Parking Fee Collection Based on Number Plate Recognition,” "M. M. Rashid, A. Musa, M. Ataur Rahman, and N. Farahana, A. Farhana
[3] “Cars, Aesthetics and Urban Development | Knowledge, Technology & Policy.” Accessed: Jun. 16, 2024. [Online]. Available: https://link.springer.com/article/10.1007/s12130-008-9053-9 Sách, tạp chí
Tiêu đề: Cars, Aesthetics and Urban Development | Knowledge, Technology & Policy
[6] F. Piccialli, F. Giampaolo, E. Prezioso, D. Crisci, and S. Cuomo, “Predictive Analytics for Smart Parking: A Deep Learning Approach in Forecasting of IoT Data,” ACM Trans. Internet Technol., vol. 21, no. 3, p. 68:1-68:21, Jun. 2021, doi: 10.1145/3412842 Sách, tạp chí
Tiêu đề: Predictive Analytics for Smart Parking: A Deep Learning Approach in Forecasting of IoT Data,” "ACM Trans. Internet Technol
[7] D. Bhatt et al., “CNN Variants for Computer Vision: History, Architecture, Application, Challenges and Future Scope,” Electronics, vol. 10, p. 2470, Oct. 2021, doi:10.3390/electronics10202470 Sách, tạp chí
Tiêu đề: et al.", “CNN Variants for Computer Vision: History, Architecture, Application, Challenges and Future Scope,” "Electronics
[8] D. H. H. M.D and T. N. W. M.D, Brain and Visual Perception: The Story of a 25-Year Collaboration. Oxford University Press, 2004 Sách, tạp chí
Tiêu đề: Brain and Visual Perception: The Story of a 25-Year Collaboration
[9] G. S. Himes and R. M. Inigo, “Automatic target recognition using a neocognitron,” IEEE Transactions on Knowledge and Data Engineering, vol. 4, no. 2, pp. 167–172, Apr. 1992, doi:10.1109/69.134254 Sách, tạp chí
Tiêu đề: Automatic target recognition using a neocognitron,” "IEEE Transactions on Knowledge and Data Engineering
[10] Y. LeCun, L. Bottou, Y. Bengio, and P. Ha, “Gradient-Based Learning Applied to Document Recognition,” 1998 Sách, tạp chí
Tiêu đề: Gradient-Based Learning Applied to Document Recognition
[11] A. Bas, M. Topal, C. Duman, and I. van Heerden, “A Brief History of Deep Learning- Based Text Generation,” Dec. 2022, pp. 1–4. doi: 10.1109/ICCA56443.2022.10039545 Sách, tạp chí
Tiêu đề: A Brief History of Deep Learning-Based Text Generation
[12] “Fig. 4: The LeNet architecture for handwritten and machineprinted...,” ResearchGate. Accessed: Jun. 16, 2024. [Online]. Available: https://www.researchgate.net/figure/The-LeNet-architecture-for-handwritten-and-machineprinted-character-recognition-It_fig4_368518997[13]M.-C. Su and H.-T. Chang, “Fast self-organizing feature map algorithm,” IEEE Transactions on Neural Networks, vol. 11, no. 3, pp. 721–733, May 2000, doi:10.1109/72.846743 Sách, tạp chí
Tiêu đề: Fig. 4: The LeNet architecture for handwritten and machineprinted...,” ResearchGate. Accessed: Jun. 16, 2024. [Online]. Available: https://www.researchgate.net/figure/The-LeNet-architecture-for-handwritten-and-machineprinted-character-recognition-It_fig4_368518997 [13] M.-C. Su and H.-T. Chang, “Fast self-organizing feature map algorithm,” "IEEE Transactions on Neural Networks
[14] A. F. Agarap, “Deep Learning using Rectified Linear Units (ReLU).” arXiv, Feb. 07, 2019. doi: 10.48550/arXiv.1803.08375 Sách, tạp chí
Tiêu đề: Deep Learning using Rectified Linear Units (ReLU)
[15] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition,” CoRR, vol. abs/1406.4729, 2014, Accessed: Oct. 31, 2023.[Online]. Available: http://arxiv.org/abs/1406.4729 Sách, tạp chí
Tiêu đề: Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition,” "CoRR
[16] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path Aggregation Network for Instance Segmentation,” CoRR, vol. abs/1803.01534, 2018, Accessed: Oct. 31, 2023. [Online]. Available Sách, tạp chí
Tiêu đề: Path Aggregation Network for Instance Segmentation,” "CoRR
[17] X. Li et al., “Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection,” CoRR, vol. abs/2006.04388, 2020, Accessed: Oct. 31, 2023.[Online]. Available: https://arxiv.org/abs/2006.04388 Sách, tạp chí
Tiêu đề: et al.", “Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection,” "CoRR
[18] C. Feng, Y. Zhong, Y. Gao, M. R. Scott, and W. Huang, “TOOD: Task-aligned One-stage Object Detection,” CoRR, vol. abs/2108.07755, 2021, [Online]. Available:https://arxiv.org/abs/2108.07755 Sách, tạp chí
Tiêu đề: TOOD: Task-aligned One-stage Object Detection,” "CoRR
[19] “Automated image segmentation using improved PCNN model based on cross-entropy.” Accessed: Nov. 01, 2023. [Online]. Available: https://ieeexplore.ieee.org/document/1434171/ Sách, tạp chí
Tiêu đề: Automated image segmentation using improved PCNN model based on cross-entropy
[20] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” CoRR, vol. abs/1512.03385, 2015, Accessed: Nov. 01, 2023. [Online]. Available:http://arxiv.org/abs/1512.03385 Sách, tạp chí
Tiêu đề: Deep Residual Learning for Image Recognition,” "CoRR
[21] P. Almeida, L. Soares de Oliveira, A. Jr, E. Jr, and A. Koerich, “PKLot - A Robust Dataset for Parking Lot Classification,” Expert Systems with Applications, vol. 42, Feb. 2015, doi: 10.1016/j.eswa.2015.02.009 Sách, tạp chí
Tiêu đề: PKLot - A Robust Dataset for Parking Lot Classification,” "Expert Systems with Applications