1. Trang chủ
  2. » Công Nghệ Thông Tin

High Level Synthesis: from Algorithm to Digital Circuit- P12 pps

10 432 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề AutoPilot: A Platform-Based ESL Synthesis System
Tác giả Zhiru Zhang, Yiping Fan, Wei Jiang, Guoling Han, Changqi Yang, Jason Cong
Trường học University of California, Los Angeles
Chuyên ngành Electrical Engineering
Thể loại Thesis
Thành phố Los Angeles
Định dạng
Số trang 10
Dung lượng 365,87 KB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Nội dung

Hopefully, this has enabled the reader to understand how SystemC synthesis with Cynthesizer can be used to implement a broad range of functionality at multiple abstraction levels and how

Trang 1

96 M Meredith

• Synthesize RTL that implements the SystemC semantics that were simulated

• Use the same testbench for high-level simulation and RTL simulation

The design can comprise a single module or multiple cooperating modules In the case of multiple modules, the high-level SystemC simulation ensures that the modules are operating correctly individually and working together properly This simulation validates the algorithms, the protocol implementations at the interfaces, and the interactions of the modules operating concurrently

The modules can then be synthesized, and the resulting RTL can be verified using the same testbench that was used at the high level This is made possi-ble by the mixed-mode scheduling described earlier in which the algorithm is written as untimed SystemC while the interfaces are specified as cycle-accurate SystemC Multiple testbench configurations may be constructed to verify various combinations of high-level modules and RTL modules

Single SystemC Testbench

RTL Cynthesizer

Socket

C/C++ Algorithm

Socket

Cynthesizerincorporatesacompletedependencymanagementandprocessautoma-tion system that automatically generates needed cosimulaCynthesizerincorporatesacompletedependencymanagementandprocessautoma-tion wrappers and testbench infrastructure to automate verification of multiple configurations of high-level and RTL modules without any need to customize the testbench source code itself

5.11 Conclusion

This chapter has outlined the synthesizable constructs of C++ and SystemC

sup-ported by the Forte Design Systems in its Cynthesizer product It has described specific techniques that can be used to encapsulate synthesizable communication protocols in C++ classes for maximum reuse and techniques used to

automati-cally produce well-structured RTL for predictable timing closure Finally, some of

Trang 2

5 High-Level SystemC Synthesis with Forte’s Cynthesizer 97 the user-visible mechanisms for controlling scheduling and the architecture of loop implementation have been discussed along with a brief discussion of verification issues automation incorporated in the Cynthesizer product

Hopefully, this has enabled the reader to understand how SystemC synthesis with Cynthesizer can be used to implement a broad range of functionality at multiple abstraction levels and how the use of high-level C++ and SystemC constructs raises

the level of abstraction in hardware design

Trang 3

Chapter 6

AutoPilot: A Platform-Based ESL

Synthesis System

Zhiru Zhang, Yiping Fan, Wei Jiang, Guoling Han, Changqi Yang, and Jason Cong

Abstract The rapid increase of complexity in System-on-a-Chip design urges

the design community to raise the level of abstraction beyond RTL Automated behavior-level and system-level synthesis are naturally identified as next steps to replace RTL synthesis and will greatly boost the adoption of electronic system-level (ESL) design High-level executable specifications, such as C, C++, or SystemC,

are also preferred for system-level verification and hardware/software co-design

In this chapter we present a commercial platform-based ESL synthesis system, named AutoPilotTM offered by AutoESL Design Technologies, Inc AutoPilot is based on the xPilot system originally developed at UCLA It automatically gener-ates efficient RTL code from C, C++ or SystemC descriptions for a given system

platform and simultaneously optimize logic, interconnects, performance, and power Preliminary experiments demonstrate very promising results for a wide range of applications, including hardware synthesis, system-level design exploration, and reconfigurable accelerated computing

Keywords: ESL, Behavioral synthesis, Scheduling, Resource binding, Interface

synthesis

6.1 Introduction

The rapid increase of complexity in System-on-a-Chip (SoC) design urges the design community to raise the level of abstraction beyond RTL Electronic system-level (ESL) design automation has been widely identified as the next productivity boost for the semiconductor industry However, the transition to ESL design will not be as well accepted as the transition to RTL in the early 1990s without robust synthesis technologies that automatically compile high-level functional descriptions into optimized hardware architectures and implementations

P Coussy and A Morawiec (eds.) High-Level Synthesis.

c

 Springer Science + Business Media B.V 2008 99

Trang 4

100 Z Zhang et al. Despite the past failure of the first-generation behavioral synthesis technology during the mid-1990s, we believe that behavior-level and system-level synthesis and optimizations are now becoming imperative steps in EDA design flows for the following reasons:

• Embedded processors are in almost every SoC: With the coexistence of

micro-processors, DSPs, memories and custom logic on a single chip, more software elements are involved in the process of designing a modern embedded sys-tem It is natural to use C-based languages to program software for embedded processors Moreover, the automated C-based synthesis allows the designer to quickly experiment different hardware/software boundaries and explore various area/power/performance tradeoffs using a single functional specification

• Huge silicon capacity requires higher level of abstraction: Design abstraction is

one of the most effective methods for controlling rising complexity and improv-ing design productivity For example, the study from NEC [10] shows that a 1M-gate design typically requires about 300K lines of RTL code, clearly beyond what can be handled by a human designer However, the code density can be improved by more than 7X when moved to the behavior level This results in a human-manageable 40K lines of behavioral description

• Verification drives the acceptance of SystemC: Transaction-level modeling (TLM)

with SystemC [2] has become a very popular approach to system-level verifica-tion [8] Designers commonly use SystemC TLMs to describe virtual software/ hardware platforms, which serve three important purposes: early embedded software development, architectural modeling and functional verification The wide availability of SystemC functional models directly drives the needs for SystemC-based synthesis solutions, which automatically generate RTL code through a series of formal constructive transformations This avoids the slow and error-prone manual process and simplifies the design verification and debugging effort

• Accelerated computing or reconfigurable computing needs C/C++ based compilation/synthesis to FPGAs: Recent advances in FPGAs have made

recon-figurable computing platforms feasible to accelerate many high-performance computing (HPC) applications, such as image and video processing, financial analytics, bioinformatics, and scientific computing applications

Since HDLs are exotic to most application software developers, it is essential

to provide a highly automated compilation/synthesis flow from C/C++ language

to FPGAs

In this chapter we present a platform-based ESL synthesis system named

AutoPilot TM, offered by AutoESL Design Technologies, Inc AutoPilot is capable

of automatically generating efficient RTL code from an untimed or partially timed

C, C++ and SystemC description for the target hardware platform It performs

platform-based behavioral and system synthesis, tightly integrates with a modern leading-edge C/C++ compiler, and embodies a class of novel, near-optimal, and

highly-scalable synthesis algorithms

Trang 5

6 AutoPilot: A Platform-Based ESL Synthesis System 101 The synthesis technology was originally developed in the UCLA xPilot sys-tem [5], and has been licensed by AutoESL for the commercialization In its current stage, AutoPilot exhibits the following key features and advantages:

• Unified C/C++/SystemC design flow: AutoPilot accepts three kinds of

stan-dard C-based design entries: C, C++ and SystemC It also supports a variety

of abstraction models including pure untimed functional model, partially timed transactional model, and fully timed behavioral or structural model The broad coverage of languages and abstraction models allows AutoPilot to target a wide range of applications, including hardware synthesis, system-level design exploration and high-performance reconfigurable computing

• Utilization of state-of-the-art compiler technologies: AutoPilot incorporates a

leading-edge commercial-strength C/C++ compiler in the synthesis loop Many

state-of-the-art compiler techniques (intra-procedural and inter-procedural) are utilized to analyze, transform and aggressively optimize the input behaviors

• Platform-based and implementation-aware synthesis: AutoPilot takes advantage

of the target platform information to carry out more informed synthesis and opti-mization The timing, area and power for the available computation resources and communication interfaces are all characterized

In addition, AutoPilot has tight integration with several downstream RTL synthesis and physical synthesis tools to assure better quality-of-result and higher degree of automation

• Interconnect-centric and power-aware optimization: AutoPilot is able to generate

an optimized microarchitecture with consideration of the on-chip interconnects

at the high level and maximize both data locality and communication locality to achieve faster timing and power closure Furthermore, it can carry out aggressive power optimization using fine-grain clock gating and power gating

The reminder of this paper is organized as follows: Sect 6.2 presents an overview

of the AutoPilot design flow Sections 6.3 and 6.4 briefly discuss the system front-end and highlight the synthesis engine, respectively The preliminary experimental results are reported in Sect 6.5

6.2 Overall Design Flow

The overall design flow of the AutoPilot synthesis system is shown in Fig 6.1 AutoPilot accepts synthesizable C, C++, and/or SystemC as input and performs

four major steps to generate the cycle-accurate RTLs, which includes compilation and elaboration, advanced code transformation, core behavioral and communication synthesis, and microarchitecture generation

In the first step the behavioral description is parsed by a GCC-compatible front-end compiler, with the extensions to handle the bit-accurate integer data types For SystemC designs, elaboration will be invoked to extract processes, ports, channels, and interconnection topologies and construct a detail-rich system-level synthesis data model

Trang 6

102 Z Zhang et al.

C/C++/SystemC

Timing/Power/

Layout Constraints

RTL SystemC &

RTL HDLs

Platform Models

ASICs/FPGAs Implementation

=

Compilation &

Elaboration

Elaboration

Advance Code Transformation

Behavioral & Communication Synthesis and Optimizations

AutoPilot TM

User Constraints

Design Specification

Microarchitecture Generation

Fig 6.1 AutoPilotTM design flow

On top of the synthesis data model, AutoPilot applies a set of advanced code transformations and analyses to optimize the input behavior, including traditional compilation techniques such as constant propagation and dead code elimination, and hardware-specific passes such as bitwidth analysis and optimization The AutoPilot front-end will be discussed in Sect 6.3

The code transformation phase is followed by the core hardware synthesis phase AutoPilot performs platform-based synthesis and interconnect-centric optimizations during scheduling and resource binding; these take into account the user-specified frequency/latency/throughput/resource constraints and generate optimized microar-chitectures We shall discuss more details of the synthesis engine in Sect 6.4

At the back-end, AutoPilot outputs RTL VHDL/Verilog code together with con-straint files (e.g., multicycle path concon-straints, physical location concon-straints, etc.) to leverage the existing logic synthesis and physical design toolset for final imple-mentation on either ASICs or FPGAs It is worth noting that RTL SystemC code

is also generated, which can be directly compiled and simulated with the original C/SystemC test bench to verify the correctness of the synthesized RTLs

6.3 AutoPilot Front-End

In this section we discuss three major aspects of the AutoPilot front end, i.e., the language support, compiler optimizations, and the platform modeling

Trang 7

6 AutoPilot: A Platform-Based ESL Synthesis System 103

6.3.1 Language Coverage

6.3.1.1 C/C ++ Support

AutoPilot has a broad coverage of the C and C++ language features It provides

comprehensive support for most of the commonly-used data types, operators, struct/ class constructs, and control flow constructs Due to the fundamental difference between the memory models of software and hardware, AutoPilot currently dis-allows the usage of dynamic pointers, dynamic memory allocations, and function recursions

Designers can fully control the data precisions of a C/C++ specification

AutoPi-lot directly supports single and double precision floating-point types In addition, it adds the capabilities (compared to xPilot) in compiling and synthesizing bit-accurate fixed-point data types, for which standard C and C++ language lack native support

• Arbitrary-precision integer (APInt) data types: The user can specify that an

inte-ger type’s precision (bit width) is any number of bits up to eight million For

example, int24 declares an 24-bit signed integer value Constant values will be

zero or sign extended to the indicated bit width if necessary

• Arbitrary-precision fixed point (APFixed) data types: AutoPilot provides a

syn-thesizable templatized C++ library, named APFixed, for the designer to describe

fixed-point math APFixed library implements the common arithmetic routines via operator overloading and supports the standard quantization and saturation modes

• IEEE-754 standard single and double precision floating point data types are fully

supported in AutoPilot for FPGA platforms Common floating-point math rou-tines (e.g., square root, exponentiation, logarithm, etc.) can be also synthesized

6.3.1.2 SystemC Support

AutoPilot fully supports the OCSI synthesizable subset [1] for the SystemC synthesis

Designers can make use of SystemC bit-accurate data types (i.e., sc int/sc uint,

sc bigint /sc biguint, and sc fixed/sc ufixed) to define the data precisions Multi-module hierarchical designs can be specified and synthesized with the SC MODULE

constructs Within each module, multiple concurrent processes can be declared with

the SC METHOD and SC CTHREAD constructs.

6.3.2 Advanced Code Transformations

A variety of compiler optimization techniques are applied to the behavioral descrip-tion code with the objective to reduce the code complexity, maximize the data locality, and expose more parallelism The following transformations and analyses

Trang 8

104 Z Zhang et al. are particularly instrumental for AutoPilot hardware synthesis

• Traditional optimizations such as constant propagation, dead code elimination,

and common subexpression elimination that avoid functional redundancy

• Strength reductions that replace expensive operations (e.g., multiplications and

divisions) with simpler low-cost operations (e.g., shifts, additions and subtrac-tions)

• Transformations such as if-conversion and tree height reduction that explicitly

expose fine-grain operator-level parallelism

• Coarse-grain code restructuring by loop transformations such as loop unrolling,

loop flattening, loop fusion, etc

• Analyses such as bitwidth analysis, alias analysis, and dependence analysis that

help to reduce the data widths and analyze the data and control dependences These transformation are either performed locally within the function bodies, or applied intraprocedurally across the function call hierarchy

6.3.3 Platform Modeling

AutoPilot takes full advantage of the target platform information to carry out more informed synthesis and optimization The platform specification describes the avail-abilities and characteristics of the important system building blocks, including the on-chip computation resources and the selected communication interfaces

Component pre-characterization is involved in the modeling process Specifi-cally, it characterizes the delay, area, and power for each type of hardware resource, such as arithmetic units (e.g., adders and multipliers), memories (e.g., RAMs, ROMs and register files), steering logic (multiplexors), and interface logics (e.g., FIFOs, and bus interface adapters) The delay/area/power characteristic functions are derived by varying the bit widths, number of input and output ports, pipeline intervals and latencies, etc To facilitate our interconnect-centric synthesis The het-erogeneous resources distribution map and the distance-based wire delay lookup tables are also constructed

AutoPilot greatly extends the platform modeling capabilities in xPilot It can sup-port advanced ASIC process (e.g., TSMC 90 and 65 nm technologies), a wide range

of FPGA device families (e.g., Xilinx Virtex-4/Virtex-5, Altera Stratix II/Stratix III) and various accelerated computing platforms (e.g., Nallatech [4] and XDI [3] acceleration boards)

6.4 AutoPilot Hardware Synthesis Engine

This section highlights several important features of the AutoPilot synthesis engine, including scheduling, resource binding, pipelining, and interface synthesis

Trang 9

6 AutoPilot: A Platform-Based ESL Synthesis System 105

6.4.1 Scheduling

An efficient and versatile scheduler is implemented in the AutoPilot system to exploit parallelism in the behavior-level design and determine the time at which different computations and communications are performed The core scheduling algorithm is based on a mathematical programming formulation It has significant advantages over the prior approaches in two major aspects:

• Versatility: Our scheduler is able to model a rich set of scheduling constraints

(including cycle time constraint, latency constraints, throughput constraint, I/O timing constraints, and resource constraints) in the constraint system, and express different performance metrics (such as worst-case and average-case latency)

in the objective function Moreover, several important synthesis optimizations such as operation chaining, structural pipelining, behavioral template, slack distribution, etc., are all naturally encoded in a single unified mathematical framework

• Efficiency and scalability: Our scheduler is highly efficient and scalable when

compared to the other constraint-driven approaches For instance, typical ILP formulations uses discrete 0–1 variables to model the assignment relationships between operations and time steps, this requires lots of variables and complex equations to express one scheduling constraint since all feasible time steps should

be considered In our formulation, variables directly represent operation execu-tion time and are independent of the final schedule latency This leads to much more compact constraint system, and the mathematical programming model can

be efficiently solved in a few seconds for very complex designs, as evidenced by the Xilinx MPEG-4 design (to be discussed in Sect 6.5)

The first generation of our scheduler was based on the SDC-based scheduling algorithm and the technical details are available in [7]

6.4.2 Resource Binding

Resource binding determines the numbers of functional units and registers, and the sharing among compatible operations and data transfers It has a dramatic impact

on the final design quality as they determine the interconnection network with wires and steering logic

AutoPilot is capable of providing optimized binding for various functional units and memory blocks, such as integer and floating-point arithmetic units, transcen-dental functions, black-box IP blocks, registers, register files, RAMs/ROMs, etc AutoPilot’s binding algorithm can also generate different microarchitectures For example, it has an option to generate a distributed register-file microarchitecture (DRFM) to optimize both data and communication localities

DRFM has a semi-regular structure which consists of one or multiple islands

As illustrated in Fig 6.2, each DRFM island contains a local register file (LRF),

Trang 10

106 Z Zhang et al.

Island A Data-Routing Logic

Local Register File

FUP MUX Island B

Functional Unit Pool

AL Island C

Island D

Input

Island E Island F

Fig 6.2 Distributed register-file microarchitecture

a functional unit pool (FUP), and data-routing logic The LRF serves as the local storage in an island Each register file allows a variable number of read ports but only a fixed number (typically one) of write ports The LRF stores the results pro-duced from the local computation units in FUP and provides data to both local FUP and the external islands By clustering LRF and FUP into a single island, we are able to maximize both data/computation locality and communication locality This also helps us avoid, to a large extent, the centralized memory structures and global communications which often become the bottlenecks limiting system efficiency in performance, area, and power To handle the necessary inter-island communications,

we use the data-routing logic to route data from the external islands

DRFM is a semi-regular microarchitecture The configurations of the LRF, FUP and the data-routing logic are application-specific One important objective that DRFM-based resource binding tries to minimize is the inter-island connections This will simplify the data-routing logic in each island and reduce the overall complexity of the resulting datapath

The technical details of the DRFM-based resource binding algorithm are avail-able in [6]

6.4.3 Pipelining

AutoPilot’s synthesis engine (during scheduling, resource binding, and microar-chitecture generation) supports several forms of pipelining to improve the system performance

Ngày đăng: 03/07/2014, 14:20

TỪ KHÓA LIÊN QUAN