Slave PLBv46 master burst Slave buffer interface LocalLink write buffer LocalLink read buffer Reset logic Interrupt generation Bridge control logic Bridge status signals Interrupt reques
Trang 1Slave
PLBv46 master burst
Slave buffer interface
LocalLink write buffer LocalLink read buffer
Reset
logic
Interrupt generation
Bridge control logic
Bridge status signals Interrupt request
Reset
Control bus
Control bus Control bus
DCR
bus
Reconfigured region Bus macro enable
Reconfigurable socket Reset request
FIGURE 12.7
Reconfigurable socket abstraction based on the “PLBv46 PLBv46 bridge”architecture The “PLBv46 slave” and “PLBv46 master burst” blocks are stan-
dard IP components and all blocks except the DCR slave block are part of the
bridge Bus macros are implicitly present on all signals crossing the ary of the reconfigured region
bound-An alternative is to architect the interface around a bus bridge, with pendent busses in the static region and in the reconfigurable region Thedesign of the socket is based on partitioning the Xilinx “PLBv46 PLBv46bridge” IP [23], as shown in the block diagram in Figure 12.7 Internally thiscore is based around 32-bit fixed-width data FIFOs and a small number ofcontrol signals Most of the bridge is treated as part of the static region, withonly a small amount of logic required in the reconfigurable region to com-plete the bridge In addition to the bus interface, which is primarily used tointerface to the reconfigured region, the socket core also contains a controlinterface (based on the DCR protocol [7]) which is used to generate an inde-pendent reset signal to the reconfigurable region and to force signals driven
inde-by the reconfigurable module to stable values during reconfiguration
12.5.3 Direct Memory Access Interfaces
The bus interface above is a generic and flexible interface, which can be used
to communicate with the reconfigured portion of the system in differentways For instance, it may be used by the processor to both send and receivedata from the reconfigured region or as a control interface to set parame-ter values of IP cores executing in the reconfigured region However, it doeshave several disadvantages Primarily, the bandwidth of data to or from the
Trang 2processor is limited because of the overhead of bus arbitration and the factthat the memory range is treated as uncached I/O transactions Althoughperformance could be improved somewhat for large transactions by usingDMA engines or treating data transfer regions as cached and manually man-aging cache coherency, this would significantly increase the complexity ofthe processor software Secondly, many FPGA algorithms require access toexternal memory for buffering data until it can be processed For instance, in
a network router, packet data may need to be stored until a routing decisioncan be made, or in a streaming video system, several frames of video datamay need to be stored to analyze object motion between frames
Because of these limitations, it is best to consider the bus interface above
as primarily an interface used for low-bandwidth control and configurationinformation In systems that require higher bandwidth communication, ordirect access to external memory, the control interface can be augmentedwith additional interfaces to memory Although it may seem straightforward
to include a complementary bus bridge that can be driven by the ured region to provide this functionality, this tends not to be the highestbandwidth option since performance can be limited by the arbitration logic
reconfig-of the PLB bus This logic is heavily pipelined in order to maximize the busthroughput under a wide variety of usage, typically incurring three cycles oflatency before a slave can respond to a bus access
One solution is to provide an interface connected directly to the nativeport interface (NPI) of the Xilinx MPMC IP core, as shown in Figure 12.8
External memory (e.g., DDR/DDR2)
Arbiter
Multiported memory controller Physical interface
Trang 3Typically, this interface exhibits both lower latency and higher bandwidththan the PLB bus Although the MPMC must still arbitrate between differentports attempting to use the memory controller, this arbitration can be per-formed locally within the memory controller and concurrently with the databeing provided The only disadvantage of connecting directly to the mem-ory controller is that other IP cores in the static region cannot be accessedfrom the reconfigured region However, since in the SRP usage model these
IP cores are likely being managed by device drivers in the operating system
of the processor, it is questionable whether such access should be allowedanyway
12.5.4 External Interfaces
In addition to communicating with the static region, a reconfigurable ule may also communicate with other interfaces external to the FPGA Inorder to accomplish this, a reconfigurable region may include external I/Opins and/or high-speed serial transceivers For the most part, these resourcescan be treated as any other FPGA primitives and can be placed and routed
mod-as usual
However, there is some complexity with regard to external I/O pins,since in many FPGA designs, the input/output buffer (IOB) primitives rep-resenting external I/O pins are not explicitly instantiated in a user design butare inferred in the synthesis process Normally in a hierarchical design, thenetlist can be synthesized using a special option to disable inference of theseprimitives, since they will be inferred or instantiated during synthesis of thetoplevel design However, when building a generic FPGA platform, relying
on this may not be desirable, since the reconfigured region may require morecontrol over the configuration of these primitives In other cases, exactlywhich IOB primitives are explicitly instantiated in a reconfigurable moduleand which ones are not may not be known when the static design is synthe-sized and implemented One way to solve this is to not expose any I/O pins
of the reconfigurable region as external signals of the static region, implyingthat synthesis of the static design will never include IOB primitives for thesepins When a reconfigurable module is synthesized, signals interfacing withthe static region are individually tagged with the constraintBUFFER_TYPEset to NONE, indicating that no IOB primitives should be inferred for thosesignals
High-speed serial transceivers also have additional design complexity,since each transceiver is associated with specialized clock resources in theFPGA These clock resources typically include phase-locked loops for clocksynchronization and dedicated clock distribution paths and may be sharedbetween transceivers From the perspective of building FPGA platforms, thisresource sharing combined with how transceivers are grouped into configu-ration frames may need to be considered during the floorplanning stage inorder to gain maximum usage of the available transceivers
Trang 4Static design flow EDK base
planning ngc
Floor-.dts
PR-enabled NGDBuild, Map, and PAR
PR-enabled bitgen
EDK genace.tcl static.ace bit ncd static.used
UCF merge ucf EDKplatgen mhs
EDK hand design
Module design flow Hand
design
.ucf
.ngc PR-enabled NGDBuild, Map, and PAR
.ncd PRMergeDesign + PR-enabled bitgen Meta-information
C code gcc + objcopy
EDK genace.tcl partial.bit bit
merged.ace configure.elf
static.usedfor later use Since by default the interface with the reconfiguredregion is driven to an idle state, the resulting bitstream can be used in a sys-tem without programming the remainder of the FPGA The device tree for
a particular design is generated from the EDK design, and after being verted to a binary device tree blob, can be included in the Linux kernel image,
con-or stcon-ored as the initial value of a BRAM in the bitstream Lastly, EDK is used
to package the FPGA bitstream with the Linux kernel binary in a bootableimage that can be used with Xilinx SystemAce [24] to boot the kernel
The right-hand side of Figure 12.9 shows a second pass for the mentation of a reconfigurable module During this pass, the logic of the
Trang 5imple-reconfigurable module is implemented together with a small portion of thestatic logic called the “context logic.” The context logic is necessary to pro-vide the context of the reconfigurable module, so that hierarchical names
in the design and location constraints for clock signals and bus macroscan be preserved The design constraints for implementation are created bymerging the design constraints from the static design with any additionaldesign constraints specific to the reconfigurable module, such as pin loca-
tion constraints During this pass, the routing resources in the file static.used
are excluded from use, since these resources are already used in the staticdesign The final bitstream for the reconfigurable module is generated byfirst merging the design database (contained in an ncd file) from bothpasses, ensuring that the configuration bits used in the static design are pro-grammed correctly In addition, design rule checks and timing analysis can
be applied to the merged design database, to ensure that individual passeswere implemented correctly From the merged design database, it is possi-ble to generate both a partial bitstream that can be used after configurationwith the static bitstream and a merged bitstream which can be used as an ini-tial configuration bitstream, with the reconfigurable module already loaded
To enable reconfiguration in a Linux system, the partial bitstream is lated with the Linux code for performing PR and the meta-information aboutthe reconfigurable module, to generate a Linux executable, as described inSection 12.6
encapsu-12.6 Managing Partial Reconfiguration in Linux
Two device drivers are used to manage the reconfiguration process ily, the device driver for the ICAP device performs the actual reconfigura-tion When a partial bitstream is written to this device (for instance, usingthecpcommand or thewrite()system call), the bytes are transferred tothe ICAP Since the device driver does not inspect or modify the stream ofbytes, the data being written must include the appropriate control words, asexpected by the configuration interface [26] The device driver also includessimple locking of the ICAP resource, in order to prevent different processesfrom unexpectedly interleaving accesses to the ICAP Readback is also possi-ble using this device driver by writing the correct readback request bitstream
Primar-to the ICAP and subsequently reading data (using theread()system call).The second device driver used to manage reconfiguration is associatedwith the reconfigurable socket core This driver exports a character interface
to which meta-information about a reconfigurable module can be written Asimple way of representing this meta-information is in the form of an array
of struct platform_device, a data structure which is used internally
by Linux to represent devices A more complex, but perhaps more robust
Trang 6Reconfigure FPGA
Notify kernel of devices
Load kernel modules
Enable bus macros Reset reconfigurable module
Processing Unload kernel modules
Release devices
Disable bus macros
FIGURE 12.10
The reconfiguration process
representation of meta-information could be an additional device tree blob.This meta-information is parsed and checksummed and, if valid, is used tonotify the Linux kernel of the presence of new devices, which can then bebound to other device drivers An invalid checksum is interpreted as an indi-cation to unbind any previously loaded devices and release ownership of thereconfigured region Secondarily, this device driver also enables and disablesthe bus macros between the static region and the reconfigured region, andcontrols the reset of the reconfigured region As with the ICAP device driver,the socket device driver includes a simple locking mechanism in order toprevent a process from unexpectedly reconfiguring an active region in use
by loading the appropriate kernel modules and the Linux kernel binds thosedevice drivers to the reconfigured devices At this point, application codemay use the device drivers to communicate with the reconfigured region
A similar sequence of steps in reverse order occurs to unbind the devicedrivers and release the reconfigured region so that different processingmay occur
Since the ICAP device and the control interface of the socket are exposedthrough device drivers, it is relatively straightforward to implement recon-figuration through a regular user process One possibility for implementing
Trang 7this involves linking the bitstream and meta-information into a single cutable along with the code for reconfiguration The process created whenthis executable is executed can be controlled through any operating systemmechanism (such as POSIX signals) to manage the life cycle of the moduleloaded in the FPGA The executable can also be linked together with otherapplication code, resulting in a familiar processor-centric usage model forthe FPGA fabric This approach is similar in spirit, but greatly different inimplementation from that proposed in [18], which performs essentially thesame processes using the Linux kernel’s ability to implement new executableformats.
exe-It is important to recognize that although the reconfiguration process
is managed by a user process, it must be treated as a privileged tion executed as the root user, since there are many places where bothunintended errors and malicious attacks may result in unintended behav-ior Some of these places are not specific to the PR process, such as loadingkernel modules, whereas others are more subtle vulnerabilities For instance,
opera-as noted before, partial bitstreams have significant constraints on how theyare constructed and are specific to a particular implementation of the staticsystem More directly, it is possible to trigger reconfiguration of the FPGAthrough the ICAP interface, resulting in the loss of the current state of thesystem If the bus macros are enabled during PR, then it is likely that glitch-ing on the interface signals will result in unintended behavior of the staticsystem
One particularly common usage error is simply attempting to load a tial bitstream that does not correspond to the current implementation of thestatic design This may happen during development when a modification ismade to the static region, but a designer neglects to reimplement a recon-figured module One way of avoiding such errors is to prepend each partialbitstream with a hash generated from the static design This hash can also
par-be stored in the static design, possibly in the device tree blob, and checkedbefore being loaded into the FPGA If the partial bitstream is not signedproperly, then the reconfiguration process can be halted without affectingthe operation of the static design This technique can be simply applied toprevent unintended errors, or adapted using more cryptographically securetechniques to prevent malicious attacks [2,4]
12.7 Putting It All Together
This section illustrates a SRP design targeted at a variant of the WARPSoftware-defined Radio hardware built by Rice University [12] Since theoriginal hardware is based on an older Virtex 2 Pro FPGA, we present
a design based on an updated Virtex 4 FX 100 device in order to better
Trang 8PPC405 (ppc_virtex4 v2.00.b)
Interrupt controller (xps_intc v1.00a)
Multiported memory controller (mpmc v3.00b)
Ethernet MAC (xps_ll_temac v1.01a)
plb plb sdma
Reconfigurable socket
Reconfigured region
a bridge from Wired Ethernet to a two-radio MIMO system The design uses
a processor to manage the packet headers and to perform configuration agement of the radios, while packet payloads are communicated directlybetween the wired and wireless network interfaces using direct memoryaccess to a processor-managed memory buffer In the reference design, thepacked payload buffer is implemened in BRAM and communicated through
man-a PLB bus In the reconfigurman-able design, we man-assume thman-at the pman-acket pman-ayloman-adbuffer is implemented in external DRAM, which must be accessed from thereconfigurable region through a separate port of the memory controller As anonreconfigurable system, this design uses approximately 50% of the device(21294 of 42176 slices)
The design of the static subsystem is shown in Figure 12.11 This design isarchitected around the PowerPC 405 processor core and was largely gener-ated using the Base System Builder capability in Xilinx EDK Standard serialport and ethernet IP cores provide external connectivity Access to external
64 bit wide DDR2 SDRAM, including DMA access for the ethernet core, isprovided by the Xilinx MPMC IP core In this system, the processor, memorybus, and memory controller are designed to be “quasi-synchronous,” mean-ing that clocks must be edge-aligned Based on the speeds of the individ-ual components, a design point was chosen targeting a slow speed gradeFPGA (−10) with the memory bus clocked at 83.3 MHz, the memory con-
troller clocked twice as fast (166.6 MHz), and the processor clocked threetimes as fast (250 MHz)
Trang 9Reconfigured region
ICAP interface
Control interface bus macros
Memory interface bus macros
Utilized powerPC core
Static region
FIGURE 12.12
Placed and routed design of an FPGA processor platform, targeting a Virtex
4 FX 100
The FPGA layout of the design is shown in Figure 12.12, overlaid with the
PR floorplanning constraints The static region is at the south of the chip, and
is exactly two configuration frames tall This layout provides approximately
8600 slices and 128 external I/O pins, which accommodates both the logic
Trang 10requirements of a simple processor design, and the I/O pins requirements of
a 64-bit DDR2 memory interface A significantly smaller region would fail toprovide enough logic cells for the static design, while a larger region wouldallocate too many pins to the static region, which would be difficult to accessfrom the reconfigurable region
Note that the majority of the routed signals are contained within the planned area for the static region The routes entering the top region connectprimarily to external I/O pins and FPGA resources, such as clock buffers andthe ICAP, located in the center column of the FPGA Some routes into the topregion also connect to the PowerPC cores Although only one PowerPC isactually used in the static design, current versions of the EA PR tools do notallow PowerPC cores to be part of the reconfigured portion of the design.Hence, this design instantiates both PowerPC cores in the static region, inorder to enable use of the JTAG chain, which is assumed to connect throughboth cores
floor-The device tree for this design is shown in Figure 12.13 Since the targetedboard includes Xilinx SystemACE, this is used to configure the FPGA andinitialize external memory with the kernel image The compressed devicetree blob is initialized in the BRAM at address 0xfffff800 and decom-pressed by the Linux bootwrapper executing out of external memory Theroot filesystem is stored on an external file server and loaded over the net-work interface using the NFS protocol
solu-as processor cores, and where physical interfaces to the rest of the system arehighly flexible and incorporate many features that cannot be easily modeledeven at the circuit and gate level
However, using the architectural features of some FPGAs, such as PR,higher level platforms can be constructed that abstract many of these detailsand are more appropriate for mapping from a high-level design tool Thischapter has particularly shown how this technique can abstract the complex-ities associated with including a control processor and operating system aspart of an FPGA platform
Trang 11reg = < 41300000 10000 >;
xlnx,family = "virtex4";
} ; } ; plbv46_dcr_bridge_0: plbv46−dcr−bridge@80700000 { compatible = "xlnx,plbv46−dcr−bridge−1.00.a"; dcr−access−method = "mmio";
dcr−controller ; dcr−mmio−range = < 80700000 1000 >;
dcr−mmio−stride = <4>;
} ; rs232: serial@84000000 { clock−frequency = <4f790d5>;
} ; xps_intc_0: interrupt−controller@81800000 {
#interrupt−cells = <2>;
compatible = "xlnx,xps−intc−1.00.a";
interrupt−controller ; reg = < 81800000 10000 >;
xlnx,num−intr−inputs = <5>;
} ; xps_socket_0: xps−socket@50000000 { compatible = "xlnx,xps−socket";
FIGURE 12.13
Device tree
Trang 121 B Blodget, P James-Roxby, E Keller, S McMillan, and P Sundararajaran
A self-reconfiguring platform In Proceedings of the International Field grammable Logic and Applications Conference (FPL), Lisbon, Portugal, 2003 Lecture Notes in Computer Science, Vol 2778, Springer-Verlag, September
Pro-2003
2 J Castillo, P Huerta, V Lopez, and J Martinez A secure
self-reconfiguring architecture based on open-source hardware In national Conference on Reconfigurable Computing and FPGAs (ReConFig),
Inter-Puebla City, Mexico, September 2005
3 J Corbett, A Rubini, and G Kroah-Hartman Linux Device Drivers.
O’Reilly, Sebastopol, CA, 3rd edition, 2005
4 R Fong, S Harper, and P Athanas A versatile framework for FPGA field
updates: An application of partial self-reconfiguration In Proceedings of the IEEE International Workshop on Rapid System Prototyping, San Diego,
7 IBM Device control register bus architecture specifications version 3.5,January 2006
8 IBM 128-bit processor local bus architecture specifications version 4.7,May 2007
9 I Kuon and J Rose Measuring the gap between FPGAs and ASICs IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,
26(2):203–215, February 2007
10 E A Lee and S Neuendorffer Actor-oriented models for codesign: ancing re-use and performance In S Shukla and J.-P Talpin (editors),
Bal-Formal Methods and Models for System Design: A System Level Perspective,
pp 33–56, Kluwer, Norwell, MA, 2004
11 M Majer, J Teich, A Ahmadinia, and C Bobda The Erlangen slot
machine: A dynamically reconfigurable FPGA-based computer Journal
of VLSI Signal Processing Systems, 47(1):15–31, March 2007.
Trang 1312 P Murphy, A Sabharwal, and B Aazhang Design of WARP: A
flexi-ble wireless open-access research platform In Proceedings of the European Signal Processing Conference (EUSIPCO), Florence, Italy, 2006.
13 A Parsons et al A scalable correlator architecture based on modular
FPGA hardware and data packetization In Asilomar Conference on nals, Systems, and Computers, Pacific Grove, CA, November 2006.
Sig-14 A Parsons et al A scalable correlator architecture based on
modu-lar FPGA hardware and data packetization Submitted to IEEE actions on Signal Processing, available at http://casper.berkeley.edu/
17 P Sedcole, B Blodget, T Becker, J Anderson, and P Lysaght Modular
dynamic reconfiguration in Virtex FPGAs IEE Proceedings on Computers and Digital Techniques, 153(3):157–164, May 2006.
18 H K.-H So and R W Brodersen Improving usability of FPGA-based
reconfigurable computers through operating system support In ings of the International Field Programmable Logic and Applications Conference (FPL), Madrid, Spain, 2006.
Proceed-19 Sun Opensparc web page, available at http://www.opensparc net,accessed on March 7, 2008
20 Triscend Triscend e5 configurable system-on-chip platform datsheet,July 2001, v1.06
21 M Uhm and J Bezile Meeting software defined radio cost and power
targets: Making SDR feasible in Military Embedded Systems, pp 6–8, May
2005
22 J Williams and N Bergmann Embedded linux as a platform for
dynami-cally self-reconfiguring systems-on-chip In Proceedings of the International Multiconference in Computer Science and Computer Engineering (ERSA), Los
Vegas, CA, June 2004
23 Xilinx PLBv46 to PLBv6 Bridge Data Sheet, ds618 edition
Ver-sion 1.00.a, available at http:/www.xilinx.com/bvdocs/ipcenter/data_sheet/plbv46_plbv46_bridge.pdf, accessed on March 6, 2008
24 Xilinx Embedded System Tools Reference Manual, ug111 v9.2 edition,
September 2007
Trang 1425 Xilinx PLBV46 Interface Simplifications, sp026 edition, October 2007.
26 Xilinx Virtex-4 FPGA Confituration User Guide, ug071 v1.10 edition, April
2008
27 Xilinx Virtex-4 FPGA Guide, ug070 v2.40 edition, April 2008.
28 K Yaghmour Building Embedded Linux System O’Reilly, Sebastopol, CA,
2003