Algorithms and hardware architectures for high efficient spiking neural networks = các thuật toán và kiến trúc phần cứng cho mạng thần kinh tăng tốc hiệu quả cao

Trang 1

VIETNAM NATIONAL UNIVERSITY, HANOI

UNIVERSITY OF ENGINEERING AND TECHNOLOGY

NGUYEN DUY ANH

ALGORITHMS AND HARDWARE ARCHITECTURES FOR HIGH EFFICIENT SPIKING NEURAL NETWORKS

PHD THESIS OF ELECTRONICS AND COMMUNICATIONS ENGINEERING

Hanoi – 2023

Trang 2

VIETNAM NATIONAL UNIVERSITY, HANOI

UNIVERSITY OF ENGINEERING AND TECHNOLOGY

NGUYEN DUY ANH

ALGORITHMS AND HARDWARE ARCHITECTURES FOR HIGH EFFICIENT SPIKING NEURAL NETWORKS

Specialty: Electronic Engineering Code: 9510302.01

PHD THESIS OF ELECTRONICS AND COMMUNICATIONS ENGINEERING

SUPERVISOR:

Assoc Prof Tran Xuan Tu Prof Francesca Iacopi

Hanoi – 2023

Trang 3

LIST OF PUBLICATIONS

1 [C1] Duy-Anh Nguyen, Xuan-Tu Tran, K N Dang and F Iacopi, "A lightweight Max-Pooling method and architecture for Deep Spiking Convolutional Neural Networks", 2020 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS), 2020, pp 209-212

2 [C2] Duy-Anh Nguyen, Duy-Hieu Bui, Francesca Iacopi, Xuan-Tu Tran, "An Efficient Event-driven Neuromorphic Architecture for Deep Spiking Neural Networks ", 2019 32nd IEEE International System-on-Chip Conference (SOCC), 2019, pp 144-149

3 [J1] Duy-Anh Nguyen, Xuan-Tu Tran and F Iacopi, "A Review of Algorithms and Hardware Implementations for Spiking Neural Networks Journal of Low Power Electronics and Applications", 2021; 11(2):23

4 [J2] Duy-Anh Nguyen, Xuan-Tu Tran, K N Dang and F

Iacopi "A low-power, high-accuracy with fully on-chip ternary weight hardware architecture for Deep Spiking Neural Networks ", Microprocessors and Microsystems 90,

104458

Trang 4

However, there are some fundamental differences between DNN and the way the real human brain works, including the communication mechanism between neurons and the learning process of the networks As the human brain is highly energy efficient with endless computing capabilities, the third generation of ANN, which is called Spiking Neural Network (SNN) has been proposed It has been shown that SNNs could reach higher energy efficiency while still having comparable computing capabilities to ANN

However, as the size of SNN and DNN networks grows, the increasing computational complexity has made their evaluation of traditional Von Neumann computer architecture very time-consuming and energy inefficient The VLSI research community has put considerable research effort to design novel hardware architectures which take inspiration from the brain structure These systems are referred to as Neuromorphic Computing However, the development of neuromorphic computing platforms is still facing the following challenges:

Trang 5

• Excessive memory requirements of the current SNN architecture prohibit the storage of the network parameters with on-chip memory Additional energy overhead is incurred as the memory need to be moved back and forth between DRAM and on-chip memory The memory requirements for large SNN networks need to be decreased

• The pooling operation in Convolutional Spiking Neural Networks is limited to the average pooling operation, which is not preferred Novel max-pooling operation in SNNs is required This thesis aims to propose efficient solutions to the above-mentioned challenges The following contributions are included:

• A simple digital design for the Integrate-and-Fire neuron and a novel hardware architecture to implement a fully connected, 3-layers SNN network for the MNIST application

• A novel training algorithm for SNNs with ternary weights SNN) To demonstrate the energy efficiency of this method, a novel and dedicated hardware architecture for TW-SNN is also proposed

(TW-• A novel max-pooling method for convolutional SNNs is proposed

Trang 6

Chapter 1 Spiking Neural Networks – A Software and

Hardware Perspective

This chapter aims to give some background knowledge about SNNs, both from a software and hardware perspective The basic elements of SNNs, namely the dynamics of neurons and the information encoding strategy are presented Several different methods of training algorithms for SNN are discussed The hardware implementation strategy for SNNs, both for large networks and small-scale networks is considered By analyzing SNN from both algorithm and hardware design perspectives, the thesis aims to point out the potential methods of improving current SNN software & hardware design in terms of energy efficiency

1.1 Introduction to Spiking Neural Networks

1.1.1 Neuron models

The most popular neuron model is the Leaky Integrate and Fire (LIF) model, where the input spikes are integrated at the membrane potential After that, a leak value is subtracted and the LIF neuron will give an output spike if the potential crosses a certain threshold value The LIF neuron is easily implemented in hardware Other neuron models such as the Hodgkin-Huxley or Izhikevich model are also gaining much interest lately

1.1.2 Encoding Information in SNN

There are three main types of information encoding in SNN, namely the rate encoding, the Time-to-First Spike, and the Inter-Spike-Interval method

1.2 Learning rules in SNN

There are three major categories of learning rules in SNN The first is the unsupervised learning rules with STDP, the second is the

Trang 7

supervised learning rule with backpropagation and the last one is to convert a trained DNN to SNN

1.2.1 Unsupervised Learning with STDP

This method is bio-plausible, with online learning capability, but does not perform well for deep, multi-layer networks

1.2.2 Supervised Learning with Backpropagation

This method has a better performance compared to the unsupervised learning method, with a limited degree of online learning, but still does not perform well on some complicated datasets

1.2.3 Conversion of SNN from DNN

This method could leverage many state-of-the-art training techniques from traditional DNN networks This method could reach high accuracy for some complicated dataset but is only suitable for application that does not require online adaptations

1.3 Hardware Implementation of SNN

1.3.1 Large Scale Neuromorphic Accelerator

The general strategy for implementing large-scale networks is to have many neuromorphic cores which could operate in parallel Each core has dedicated neuron update logic and synapse memory The communication is handled with a scalable Network-on-Chip Notable works include SpiNNaker, TrueNorth, Neurogrid, and Loihi

1.3.2 Low-power SNN Accelerator

Notable works include the ODIN chip by Frenkel et al., the work

by Yin et al., Zheng et al., and Chen et al Most of the works focus

on small-scale network design with the MNIST dataset as the benchmark applications

Trang 8

1.3.3 Non-volatile Memories in Spiking Neural Network Architectures

This section presents state-of-the-art works in hardware architectures for SNNs with non-volatile memories

Conclusion

In this chapter, we present a brief introduction to the research field of SNNs, from both software and hardware perspectives From the state-of-the-art hardware architectures for SNNs, we could see the lack of focus on reducing the weights through the quantization technique To the best of our knowledge, there has not been a work that proposes a quantization-aware training algorithm for SNNs Furthermore, the low-embedded accelerators mostly focus on the fully-connected network topologies Since convolutional topology brings more advanced operators such as max-pooling, there is not enough research and discussion on handling such operations in the hardware for SNNs This thesis aims to solve the above-mentioned problems

Trang 9

Chapter 2 An Efficient Event-driven Neuromorphic

Architecture for Deep Spiking Neural Networks

The main contributions of this chapter include a novel digital IF neuron model to support SNN operations and system-level hardware architecture that supports handwritten digit recognition with the MNIST dataset

2.1 Background

This section presents the related information regarding the neuron models, the conversion algorithm, and the rate-encoding method for the inputs

2.2 Hardware Architecture

2.2.1 Digital Neuron – the basic processing element

The proposed architecture for the digital neuron - the basic processing element is presented in this section In general, the neuron has two modes of operation, which are the integration mode and the reset-and-fire mode

Trang 10

2.3 Evaluation results

2.3.1 Software simulation results

The chosen SNN model is a 2-layer, fully connected network with

a size of 784 x 48 x 10, for the MNIST dataset The network was trained with MATLAB (32b floating-point), then quantized to 10b fixed-point format The system architecture was realized with the VHDL language at Register-Transfer-Level The simulation results show that the floating-point network could reach 94.6% of accuracy The quantization process incurs a negligible loss of accuracy, and the hardware results matched the quantized software results

2.3.2 Hardware implementation results

The system has been implemented with a 45nm NANGATE library The implementation results for a single IF neuron show that the proposed design can operate at a comparable frequency but with

a reduction of up to 4.2X in terms of hardware area cost

Figure 2.2 System top-level architecture

Trang 11

The system-level implementation results show that our system is lightweight with only a core area of 15 𝜇𝑚!(19.2k 2-input NAND gate equivalent), with a maximum frequency of 250 MHz and a maximum throughput of 325k frames/second

Conclusion

In this work, we propose a lightweight neuromorphic architecture that can be applied to the handwritten digit recognition application The simulation results show that even with limited fixed-point precision, our hardware system can reach a similar accuracy compared to floating point software implementation Hardware implementation results have shown that our system is resource-efficient and can satisfy the constraints of real-time applications For future works, the system can be adapted to a more generic scalable neurosynaptic core (supports different network topology like convolutional neural networks, recurrent neural networks, etc.) Possible online learning modes to support unsupervised learning algorithms will also be considered Quantization-aware training for SNNs can be investigated, as of right now only a simple quantization scheme is performed

Trang 12

Chapter 3 Ternary Weight Spiking Neural Networks

The main contributions of this chapter include a novel, oriented training procedure for SNNs with the network parameters constrained to a ternary format The training procedure has been applied to the image recognition tasks with the MNIST and CIFAR datasets, with both fully connected and convolutional topologies An efficient neuromorphic processing core to support our trained SNN with ternary weight is also proposed

hardware-3.1 Related Works

This section reviews the existing works in the literature about the training algorithms for SNNs and the existing quantization techniques for low-precision neural networks

3.2 The proposed training algorithm

3.2.1 Definition of time steps and inference latency in spiking neural networks

To simulate SNN's activities, normally the simulation time is discretized into a number of simulation timesteps T The inference latency is defined as the number of timesteps required to reach a certain classification accuracy

3.2.2 Analysis of memory storage and energy from memory access for SNNs

This section covers the detailed analysis of the required memory storage and the energy for the memory access for SNNs, both for fully-connected and convolutional SNNs

Trang 13

3.2.3 Training of SNN with ternary weight

The proposed training methodology is shown in Algorithm 2

Algorithm 2: Training Procedure of TW-SNN

Input: Number of training epochs 𝑁 , ternarization threshold

factor 𝛽, weight initialization

For 𝒊 ← 𝟏 𝒕𝒐 𝑵 do

1 Statistical Weight Ternarization

• Save the full-precision weights

• Calculate weight scaling factor 𝛼

• Calculate the threshold ∆𝑡ℎ

• Calculate the ternarized weight 𝑤𝐿𝑡𝑒𝑟𝑛

2 Inference with ternarized weights

• Inference with the activation function

• Calculate the loss w.r.t to targets

3 Update weights with back-propagation

• Restore the full-precision weights

• Calculate the gradients

• Update the full-precision weights with propagation

Return Trained network parameters

3.3 The proposed hardware architecture

3.3.1 The neural processing module

At the core of our proposed hardware architecture is the neural processing module, including the PEs, the AER decoder, and SRAM memory for weight storage

Trang 14

3.3.2 Fixed 3-layers architecture

In order to compare the energy efficiency of our training approach with other low-power, embedded SNNs, we implement specific neuromorphic hardware with a fixed, 3 layers architecture (FC256-256-10 configuration) for the MNIST task The overall block diagram is shown in Figure 3.5

All the ternary weights are stored on-chip in SRAM banks Figure 3.6 shows the memory storage scheme

3.3.3 A scalable design approach for TW-SNN

The ﬁxed, 3-layer design for TW-SNN in the previous chapter has

a major drawback of limited scalability, both in terms of the number

of neurons supported, as well as the network topology For a Figure 3.5 Overall block diagram of the TW-SNN hardware systems

small-Figure 3.6 The memory storage scheme

Trang 15

scale application like MNIST, we could use such a design to maximize our gains in terms of hardware area cost and application throughput However modern SNNs networks for large-scale applications may require handling the operations of millions of neurons, with more advanced topology like convolutional or recurring A normal bus connection is not suitable for handling the communication between neurons of a such large network To support

a scalable design for TW-SNN, we adopt the 3D-IC approach which was introduced in our previous work The overall architecture of the scalable design approach is shown in Figure 3.7

3.4 Software simulation results

3.4.1 Results for fully-connected TW-SNN

We have trained our SNNs with the PyTorch framework, with 4 different network configurations, and with 3 different datasets Simulation results show that TW-SNN reaches comparable

Figure 3.7 Scalable design architecture

Tiêu đề	Algorithms and hardware architectures for high efficient spiking neural networks
Tác giả	Nguyen Duy Anh
Người hướng dẫn	Assoc. Prof. Tran Xuan Tu, Prof. Francesca Iacopi
Trường học	Vietnam National University, Hanoi University of Engineering and Technology
Chuyên ngành	Electronics and Communications Engineering
Thể loại	Luận án tiến sĩ
Năm xuất bản	2023
Thành phố	Hanoi

Định dạng
Số trang	27
Dung lượng	1,32 MB