TÀI LIỆU THAM KHẢO-IMPLEMENTATION OF FLOATING POINT ARTHMETIC ON FPGA

For floating point multiplication, in IEEE single precision format, we have to multiply two 24 bits.. For floating point addition, exponent matching and shifting of 24 bit mantissa and s

Trang 2

?

We would like to express our deep gratitude to Dr.Rahul Dubey, who not

only gave us this opportunity to work on this project, but also guided and

encouraged us throughout the course He and TAs of the course, Neeraj

Chasta and Purushothaman, patiently helped us throughout the project We

take this as opportunity to thank them and our classmates and friends for

extending their support and worked together in a friendly learning

environment And last but not the least, we would like to thank non-teaching

lab staff who patiently helped us to understand that all kits were working

Trang 3

5.m ARCHITECTURE FOR FLOATING POINT MULTIPLICATION m

Trang 4

?

Implement the arithmetic (addition/subtraction & multiplication) for IEEE-754

single precision floating point numbers on FPGA Display the resultant value

on LCD screen

?

: Floating point operations are hard to implement on FPGAs because of the

complexity of their algorithms On the other hand, many scientific problems

require floating point arithmetic with high levels of accuracy in their

calculations Therefore, we have explored FPGA implementations of addition

and multiplication for IEEE-754 single precision floating-point numbers For

floating point multiplication, in IEEE single precision format, we have to

multiply two 24 bits As we know that in Spartan 3E, m8 bit multiplier is

already there The main idea is to replace the existing m8 bit multiplier with a

dedicated 24 bit multiplier designed with small 4 bit multiplier For floating

point addition, exponent matching and shifting of 24 bit mantissa and sign

logic are coded in behavioral style Entire our project is divided into 4 modules

m Designing of floating point adder/subtractor

2 Designing of floating point multiplier

3 Creation of combined control & data paths

4 I/O interfacing: Interfacing of LCD for displaying the output and tacking inputs from block RAM

Prototypes have been implemented on Xilinx Spartan 3E

Trang 5

2 ? ?

Image and digital signal processing applications require high floating

point calculations throughput, and nowadays FPGAs are being used for

performing these Digital Signal Processing (DSP) operations Floating point

operations are hard to implement on FPGAs as their algorithms are quite

complex In order to combat this performance bottleneck, FPGAs vendors

including Xilinx have introduced FPGAs with nearly 254 m8xm8 bit dedicated

multipliers These architectures can cater the need of high speed integer

operations but are not suitable for performing floating point operations

especially multiplication Floating point multiplication is one of the

performance bottlenecks in high speed and low power image and digital signal

processing applications Recently, there has been significant work on analysis

of high-performance floating-point arithmetic on FPGAs But so far no one has

addressed the issue of changing the dedicated m8xm8 multipliers in FPGAs by

an alternative implementation for improvement in floating point efficiency It is

a well known concept that the single precision floating point multiplication

algorithm is divided into three main parts corresponding to the three parts of

the single precision format In FPGAs, the bottleneck of any single precision

floating-point design is the 24x24 bit integer multiplier required for

multiplication of the mantissas In order to circumvent the aforesaid problems,

we designed floating point multiplication and addition

The designed architecture can perform both single precision floating

point addition as well as single precision floating point multiplication with a

single dedicated 24x24 bit multiplier block designed with small 4x4 bit

multipliers The basic idea is to replace the existing m8xm8 multipliers in FPGAs

by dedicated 24x24 bit multiplier blocks which are implemented with dedicated

4x4 bit multipliers This architecture can also be used for integer multiplication

as well

Trang 6

As mentioned above, the IEEE Standard for Binary Floating Point Arithmetic (ANSI/IEEE Std 754-m985) will be used throughout our work The single precision format is shown in Figure m Numbers in this format are composed of the following three fields:

?

m : A value of ¶m indicates that the number is negative, and a ¶

indicates a positive number

Emin = -m26 to = m27

: The fractional part of the number

The fractional part must not be confused with the significand, which is m plus the fractional part The leading m in the significand is implicit When performing arithmetic with this format, the implicit bit is usually made explicit

To determine the value of a floating point number in this format we use the following formula:

The two infinities, + and - , represent the maximum positive and

negative real numbers, respectively, that can be represented in the point format Infinity is always represented by a zero significand (fraction and

Trang 7

integer bit) and the maximum biased exponent allowed in the specified format

(for example, 255m for the single-real format)

The signs of infinities are observed, and comparisons are possible

Infinities are always interpreted in the affine sense; that is, ¨ is less than any

finite number and + is greater than any finite number Arithmetic on infinities

is always exact Exceptions are generated only when the use of infinity as a

source operand constitutes an invalid operation

Whereas de-normalized numbers represent an underflow condition, the

two infinity numbers represent the result of an overflow condition Here, the

normalized result of a computation has a biased exponent greater than the

largest allowable exponent for the selected result format

 ?

Since NaNs are non-numbers, they are not part of the real number line

The encoding space for NaNs in the FPU floating-point formats is shown above

the ends of the real number line This space includes any value with the

maximum allowable biased exponent and a non-zero fraction (The sign bit is

ignored for NaNs.)

The IEEE standard defines two classes of NaNs: quiet NaNs (QNaNs) and

signaling NaNs (SNaNs) A QNaN is a NaN with the most significant fraction bit

set; an SNaN is a NaN with the most significant fraction bit clear QNaNs are

allowed to propagate through most arithmetic operations without signaling an

exception SNaNs generally signal an invalid-operation exception whenever they

appear as operands in arithmetic operations

Though zero is not a special input, if one of the operands is zero, then

the result is known without performing any operation, so a zero which is

denoted by zero exponent and zero mantissa One more reason to detect zeroes

is that it is difficult to find the result as adder may interpret it to decimal value

m after adding the hidden ¶m to mantissa

Trang 8

Floating-point addition has mainly three parts:

m Adding hidden ¶m and Alignment of the mantissas to make exponents

equal

2 Addition of aligned mantissas

3 Normalization and rounding the Result

The initial mantissa is of 23-bits wide After adding the hidden ¶m ,it is

24-bits wide

First the exponents are compared by subtracting one from the other and

looking at the sign (MSB which is carry) of the result To equalize the

exponents, the mantissa part of the number with lesser exponent is shifted

right d-times where ¶d is the absolute value difference between the exponents

The sign of larger number is anchored The xor of sign bits of the two

numbers decide the operation (addition/ subtraction) to be performed

Now, as the shifting may cause loss of some bits and to prevent this to

some extent, generally the length of mantissas to be added is no longer 24-bits

In our implementation, the mantissas to be added are 25-bits wide The two

mantissas are added (subtracted) and the most significant 24-bits of the

absolute value of the result form the normalized mantissa for the final packed

floating point result

Again xor of anchor-sign bit and the sign of result forms the sign bit for

the final packed floating point result

The remaining part of result is exponent Before normalizing the result

Value of exponent is same as the anchored exponent which is the larger of two

exponents In normalization, the leading zeroes are detected and shifted so that

a leading one comes Exponent also changes accordingly forming the exponent

for the final packed floating point result

The whole process is explained clearly in the below figure.?

?

Trang 9

of the two input signs The exponent of the product which is the second part is Calculated by adding the two input exponents The third part which is the significand of the product is determined by multiplying the two input significands each with a ´m concatenated to it

Below figure shows the architecture and flowchart of the single precision floating point multiplier It can be easily observed from the Figure that 24x24

Trang 10

bit integer multiplier is the main performance bottleneck for high speed and

low power operations In FPGAs, the availability of the dedicated m8xm8

multipliers instead of dedicated 24x24 bit multiply blocks further complicates

this problem

We proposed the idea of a combined floating point multiplier and adder

for FPGAs In this, it is proposed to replace the existing m8xm8 bit multipliers in

FPGAs with dedicated blocks of 24x24 bit integer multipliers designed with 4x4

bit multipliers In the designed architecture, the dedicated 24x24 bit

multiplication block is fragmented to four parallel m2xm2 bit multiplication

module, where AH, AL, BH and BL are each of m2 bits The m2xm2

multiplication modules are implemented using small 4x4 bit multipliers Thus,

the whole 24x24 bit multiplication operation is divided into 36 4x4 multiply

modules working in parallel The m2 bit numbers A & B to be multiplied are

divided into 4 bits groups A3,A2,Am and B3,B2,Bm respectively The flowchart

and the architecture for the multiplier block are shown below

fig 3 Flowchart for floating point multiplication

Trang 11

The additional advantage of the proposed CIFM is that floating point

multiplication operation can now be performed easily in FPGA without any

resource and performance bottleneck In the single precision floating point

multiplication, the mantissas are of 23 bits Thus, 24x24 bit (23 bit mantissa

+m hidden bit) multiply operation is required for getting the intermediate

product With the proposed architecture, the 24x24 bit mantissa multiplication

can now be easily performed by passing it to the dedicated 24x24 bit multiply

block, which will generate the product with its dedicated small 4x4 bit

multipliers

Trang 12

As evident from the proposed architecture, a high speed low power

dedicated 4x4 bit multiplier will significantly improve the efficiency of the

designed architecture Thus, a dedicated 4x4 bit multiplier efficient in terms of

area, speed and power is proposed Figure 5 shows the architecture of the

proposed multiplier For (4 X 4) bits, 4 partial products are generated, and are

added in parallel Each two adjacent partial product are subdivided to 2 bit

blocks, where a 2 bit sum is generated by employing a 2-bit parallel adder

appropriately designed by choosing the combination of half adder-half adder,

Half adder - full adder (forming the blocks m,2,3,4 working in parallel)

This forms the first level of computation The partial sums thus

generated are added again in block 5 & 6 (parallel adders), working in parallel

by appropriately choosing the combination of half adders and full adders This

forms the second level of computation The partial sums generated in the

second level are utilized in the third level (blocks 7 &8) to arrive at the final

product Hence, there is a significant reduction in the power consumption

since the whole computation has been hierarchically divided to levels The

reason for this stems from the fact that power is provided only to the level that

is involved in computation and thereby rendering the remaining two levels

switched off (by employing a control circuitry) Working in parallel significantly

improves the speed of the proposed multiplier

The proposed architecture is highly optimized in terms of area, speed and

power The proposed architecture is functionally verified in Verilog HDL and

synthesized in Xilinx FPGA Designed 4 bit multiplier architecture is shown

below

Trang 13

Fig 5 Designed 4 bit optimized multiplier

The simulation results, RTL schematics of the designed architecture,

synthesis report and verilog code are shown below

Trang 14

? We chose inputs to various sub blocks such that all the logic blocks are

ensured to function properly All the internal signals are verified as follows

?

*%,-?*$+? ?

?

We gave various random inputs even without knowing what the order of

the inputs means and then analyzed the same inputs to know the expected

output and then verified using simulation as shown below

Trang 16

BLACKBOX TEST-TOP ARITHMETIC MODULE:

?

DATA PATH & CONTROL:

BLOCK DIAGRAM FOR DATA PATH & CONTROL

Trang 17

aaa RTL SCHEMATIC FOR DATAPATH & CONTROLLER:

TEST FOR DATAPATH & CONTROLLER:

Trang 18

VARIOUS SIGNALS ² DESCRIPTION (ADDER/SUBTRACTOR MODULE):

m ? A,B: input 32-bit single precision numbers

2 ? C:output 32-bit single precision number

3 ? sm,s2,s3, em,e2,e3 &fm,f2,f3: sign ,exponent and fraction parts of inputs

4 ? new_fm,new_f2:aligned mantissas

5 ? de: difference between exponents

6 ? fr:25-bit result of addition of mantissas

7 ? fr_us: unsigned 25-bit result

8 ? f_fr:normalized 24-bit fraction result

9 ? er,sr:exponent and sign of result

Trang 19

TEST FOR ADDER MODULE:

VARIOUS SIGNALS ² DESCRIPTION (MULTIPLIER MODULE):

m ? INm,IN2: input 32-bit single precision numbers

2 ? OUT: output 32-bit single precision number

3 ? SA,SB,EA,EB,MA,MB: sign ,exponent and mantissa parts of inputs

4 ? PFPM: 48 bit multiplication result

5 ? SPFPM: shifted result of multiplication

6 ? EFPM: exponent result (output of exponent addition module)

7 ? PFP: 48 bit fraction multiplication result

8 ? SFP: m bit sing of final result

9 ? EFP: 8 bit exponent of final result

m ?MFP: 23 bit mantissa of final result

operations but are not suitable for performing floating point operations

especially multiplication Floating point multiplication is one of the

performance...

processing applications Recently, there has been significant work on analysis

of high-performance floating- point arithmetic on FPGAs But so far no one has

addressed the issue of changing... multipliers in FPGAs by

an alternative implementation for improvement in floating point efficiency It is

a well known concept that the single precision floating point multiplication

algorithm

Định dạng
Số trang	38
Dung lượng	1,35 MB