Algorithms and hardware architectures for high efficient spiking neural networks = các thuật toán và kiến trúc phần cứng cho mạng thần kinh tăng tốc hiệu quả cao
Trang 1VIETNAM NATIONAL UNIVERSITY, HANOI
UNIVERSITY OF ENGINEERING AND TECHNOLOGY
NGUYEN DUY ANH
ALGORITHMS AND HARDWARE ARCHITECTURES FOR HIGH EFFICIENT SPIKING NEURAL NETWORKS
PHD THESIS OF ELECTRONICS AND COMMUNICATIONS ENGINEERING
Hanoi – 2023
Trang 2VIETNAM NATIONAL UNIVERSITY, HANOI
UNIVERSITY OF ENGINEERING AND TECHNOLOGY
NGUYEN DUY ANH
ALGORITHMS AND HARDWARE ARCHITECTURES FOR HIGH EFFICIENT SPIKING NEURAL NETWORKS
Specialty: Electronic Engineering Code: 9510302.01
PHD THESIS OF ELECTRONICS AND COMMUNICATIONS ENGINEERING
SUPERVISOR:
Assoc Prof Tran Xuan Tu Prof Francesca Iacopi
Hanoi – 2023
Trang 3LIST OF PUBLICATIONS
1 [C1] Duy-Anh Nguyen, Xuan-Tu Tran, K N Dang and F Iacopi, "A lightweight Max-Pooling method and architecture for Deep Spiking Convolutional Neural Networks", 2020 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS), 2020, pp 209-212
2 [C2] Duy-Anh Nguyen, Duy-Hieu Bui, Francesca Iacopi, Xuan-Tu Tran, "An Efficient Event-driven Neuromorphic Architecture for Deep Spiking Neural Networks ", 2019 32nd IEEE International System-on-Chip Conference (SOCC), 2019, pp 144-149
3 [J1] Duy-Anh Nguyen, Xuan-Tu Tran and F Iacopi, "A Review of Algorithms and Hardware Implementations for Spiking Neural Networks Journal of Low Power Electronics and Applications", 2021; 11(2):23
4 [J2] Duy-Anh Nguyen, Xuan-Tu Tran, K N Dang and F
Iacopi "A low-power, high-accuracy with fully on-chip ternary weight hardware architecture for Deep Spiking Neural Networks ", Microprocessors and Microsystems 90,
104458
Trang 4However, there are some fundamental differences between DNN and the way the real human brain works, including the communication mechanism between neurons and the learning process of the networks As the human brain is highly energy efficient with endless computing capabilities, the third generation of ANN, which is called Spiking Neural Network (SNN) has been proposed It has been shown that SNNs could reach higher energy efficiency while still having comparable computing capabilities to ANN
However, as the size of SNN and DNN networks grows, the increasing computational complexity has made their evaluation of traditional Von Neumann computer architecture very time-consuming and energy inefficient The VLSI research community has put considerable research effort to design novel hardware architectures which take inspiration from the brain structure These systems are referred to as Neuromorphic Computing However, the development of neuromorphic computing platforms is still facing the following challenges:
Trang 5• Excessive memory requirements of the current SNN architecture prohibit the storage of the network parameters with on-chip memory Additional energy overhead is incurred as the memory need to be moved back and forth between DRAM and on-chip memory The memory requirements for large SNN networks need to be decreased
• The pooling operation in Convolutional Spiking Neural Networks is limited to the average pooling operation, which is not preferred Novel max-pooling operation in SNNs is required This thesis aims to propose efficient solutions to the above-mentioned challenges The following contributions are included:
• A simple digital design for the Integrate-and-Fire neuron and a novel hardware architecture to implement a fully connected, 3-layers SNN network for the MNIST application
• A novel training algorithm for SNNs with ternary weights SNN) To demonstrate the energy efficiency of this method, a novel and dedicated hardware architecture for TW-SNN is also proposed
(TW-• A novel max-pooling method for convolutional SNNs is proposed
Trang 6Chapter 1 Spiking Neural Networks – A Software and
Hardware Perspective
This chapter aims to give some background knowledge about SNNs, both from a software and hardware perspective The basic elements of SNNs, namely the dynamics of neurons and the information encoding strategy are presented Several different methods of training algorithms for SNN are discussed The hardware implementation strategy for SNNs, both for large networks and small-scale networks is considered By analyzing SNN from both algorithm and hardware design perspectives, the thesis aims to point out the potential methods of improving current SNN software & hardware design in terms of energy efficiency
1.1 Introduction to Spiking Neural Networks
1.1.1 Neuron models
The most popular neuron model is the Leaky Integrate and Fire (LIF) model, where the input spikes are integrated at the membrane potential After that, a leak value is subtracted and the LIF neuron will give an output spike if the potential crosses a certain threshold value The LIF neuron is easily implemented in hardware Other neuron models such as the Hodgkin-Huxley or Izhikevich model are also gaining much interest lately
1.1.2 Encoding Information in SNN
There are three main types of information encoding in SNN, namely the rate encoding, the Time-to-First Spike, and the Inter-Spike-Interval method
1.2 Learning rules in SNN
There are three major categories of learning rules in SNN The first is the unsupervised learning rules with STDP, the second is the
Trang 7supervised learning rule with backpropagation and the last one is to convert a trained DNN to SNN
1.2.1 Unsupervised Learning with STDP
This method is bio-plausible, with online learning capability, but does not perform well for deep, multi-layer networks
1.2.2 Supervised Learning with Backpropagation
This method has a better performance compared to the unsupervised learning method, with a limited degree of online learning, but still does not perform well on some complicated datasets
1.2.3 Conversion of SNN from DNN
This method could leverage many state-of-the-art training techniques from traditional DNN networks This method could reach high accuracy for some complicated dataset but is only suitable for application that does not require online adaptations
1.3 Hardware Implementation of SNN
1.3.1 Large Scale Neuromorphic Accelerator
The general strategy for implementing large-scale networks is to have many neuromorphic cores which could operate in parallel Each core has dedicated neuron update logic and synapse memory The communication is handled with a scalable Network-on-Chip Notable works include SpiNNaker, TrueNorth, Neurogrid, and Loihi
1.3.2 Low-power SNN Accelerator
Notable works include the ODIN chip by Frenkel et al., the work
by Yin et al., Zheng et al., and Chen et al Most of the works focus
on small-scale network design with the MNIST dataset as the benchmark applications
Trang 81.3.3 Non-volatile Memories in Spiking Neural Network Architectures
This section presents state-of-the-art works in hardware architectures for SNNs with non-volatile memories
Conclusion
In this chapter, we present a brief introduction to the research field of SNNs, from both software and hardware perspectives From the state-of-the-art hardware architectures for SNNs, we could see the lack of focus on reducing the weights through the quantization technique To the best of our knowledge, there has not been a work that proposes a quantization-aware training algorithm for SNNs Furthermore, the low-embedded accelerators mostly focus on the fully-connected network topologies Since convolutional topology brings more advanced operators such as max-pooling, there is not enough research and discussion on handling such operations in the hardware for SNNs This thesis aims to solve the above-mentioned problems
Trang 9Chapter 2 An Efficient Event-driven Neuromorphic
Architecture for Deep Spiking Neural Networks
The main contributions of this chapter include a novel digital IF neuron model to support SNN operations and system-level hardware architecture that supports handwritten digit recognition with the MNIST dataset
2.1 Background
This section presents the related information regarding the neuron models, the conversion algorithm, and the rate-encoding method for the inputs
2.2 Hardware Architecture
2.2.1 Digital Neuron – the basic processing element
The proposed architecture for the digital neuron - the basic processing element is presented in this section In general, the neuron has two modes of operation, which are the integration mode and the reset-and-fire mode
Trang 102.3 Evaluation results
2.3.1 Software simulation results
The chosen SNN model is a 2-layer, fully connected network with
a size of 784 x 48 x 10, for the MNIST dataset The network was trained with MATLAB (32b floating-point), then quantized to 10b fixed-point format The system architecture was realized with the VHDL language at Register-Transfer-Level The simulation results show that the floating-point network could reach 94.6% of accuracy The quantization process incurs a negligible loss of accuracy, and the hardware results matched the quantized software results
2.3.2 Hardware implementation results
The system has been implemented with a 45nm NANGATE library The implementation results for a single IF neuron show that the proposed design can operate at a comparable frequency but with
a reduction of up to 4.2X in terms of hardware area cost
Figure 2.2 System top-level architecture
Trang 11The system-level implementation results show that our system is lightweight with only a core area of 15 𝜇𝑚!(19.2k 2-input NAND gate equivalent), with a maximum frequency of 250 MHz and a maximum throughput of 325k frames/second
Conclusion
In this work, we propose a lightweight neuromorphic architecture that can be applied to the handwritten digit recognition application The simulation results show that even with limited fixed-point precision, our hardware system can reach a similar accuracy compared to floating point software implementation Hardware implementation results have shown that our system is resource-efficient and can satisfy the constraints of real-time applications For future works, the system can be adapted to a more generic scalable neurosynaptic core (supports different network topology like convolutional neural networks, recurrent neural networks, etc.) Possible online learning modes to support unsupervised learning algorithms will also be considered Quantization-aware training for SNNs can be investigated, as of right now only a simple quantization scheme is performed
Trang 12Chapter 3 Ternary Weight Spiking Neural Networks
The main contributions of this chapter include a novel, oriented training procedure for SNNs with the network parameters constrained to a ternary format The training procedure has been applied to the image recognition tasks with the MNIST and CIFAR datasets, with both fully connected and convolutional topologies An efficient neuromorphic processing core to support our trained SNN with ternary weight is also proposed
hardware-3.1 Related Works
This section reviews the existing works in the literature about the training algorithms for SNNs and the existing quantization techniques for low-precision neural networks
3.2 The proposed training algorithm
3.2.1 Definition of time steps and inference latency in spiking neural networks
To simulate SNN's activities, normally the simulation time is discretized into a number of simulation timesteps T The inference latency is defined as the number of timesteps required to reach a certain classification accuracy
3.2.2 Analysis of memory storage and energy from memory access for SNNs
This section covers the detailed analysis of the required memory storage and the energy for the memory access for SNNs, both for fully-connected and convolutional SNNs
Trang 133.2.3 Training of SNN with ternary weight
The proposed training methodology is shown in Algorithm 2
Algorithm 2: Training Procedure of TW-SNN
Input: Number of training epochs 𝑁 , ternarization threshold
factor 𝛽, weight initialization
For 𝒊 ← 𝟏 𝒕𝒐 𝑵 do
1 Statistical Weight Ternarization
• Save the full-precision weights
• Calculate weight scaling factor 𝛼
• Calculate the threshold ∆𝑡ℎ
• Calculate the ternarized weight 𝑤𝐿𝑡𝑒𝑟𝑛
2 Inference with ternarized weights
• Inference with the activation function
• Calculate the loss w.r.t to targets
3 Update weights with back-propagation
• Restore the full-precision weights
• Calculate the gradients
• Update the full-precision weights with propagation
Return Trained network parameters
3.3 The proposed hardware architecture
3.3.1 The neural processing module
At the core of our proposed hardware architecture is the neural processing module, including the PEs, the AER decoder, and SRAM memory for weight storage
Trang 14
3.3.2 Fixed 3-layers architecture
In order to compare the energy efficiency of our training approach with other low-power, embedded SNNs, we implement specific neuromorphic hardware with a fixed, 3 layers architecture (FC256-256-10 configuration) for the MNIST task The overall block diagram is shown in Figure 3.5
All the ternary weights are stored on-chip in SRAM banks Figure 3.6 shows the memory storage scheme
3.3.3 A scalable design approach for TW-SNN
The fixed, 3-layer design for TW-SNN in the previous chapter has
a major drawback of limited scalability, both in terms of the number
of neurons supported, as well as the network topology For a Figure 3.5 Overall block diagram of the TW-SNN hardware systems
small-Figure 3.6 The memory storage scheme
Trang 15scale application like MNIST, we could use such a design to maximize our gains in terms of hardware area cost and application throughput However modern SNNs networks for large-scale applications may require handling the operations of millions of neurons, with more advanced topology like convolutional or recurring A normal bus connection is not suitable for handling the communication between neurons of a such large network To support
a scalable design for TW-SNN, we adopt the 3D-IC approach which was introduced in our previous work The overall architecture of the scalable design approach is shown in Figure 3.7
3.4 Software simulation results
3.4.1 Results for fully-connected TW-SNN
We have trained our SNNs with the PyTorch framework, with 4 different network configurations, and with 3 different datasets Simulation results show that TW-SNN reaches comparable
Figure 3.7 Scalable design architecture