Figure 2.3.: HdS accesses different endpoints in device core Software on CPU and hardware devices execute in parallel, whereas the digital circuit in thedevice is a massive parallel syst
Trang 24 Communication and Synchronisation
4.1 Communication
4.1.1 Nature of the Register File4.1.2 Communication on Software Side4.2 Synchronisation
4.2.1 Synchronisation Principles4.2.2 Strange Behaviour of Register Files4.2.3 Message Transfers
4.2.4 Functionality Adjustment4.2.5 The Semantic Layer4.2.6 Functional Synchronisation
5 Design
5.1 Object-Oriented Design
5.1.1 Object-Oriented Layering5.1.2 Applying Design Pattern5.1.3 Device Counterparts
Trang 35.2.1 Overview5.2.2 Data Acquisition Example5.2.3 Conclusion and Open Problems5.3 State Machine Based Design
5.3.1 Finte State Machines5.3.2 Hardware/Software-Codesign Perspective5.3.3 Software Modelling with FSM
6.3.4 Hot-Plugging6.3.5 Reconfigurable Devices
7 Development
7.1 Software Engineering
7.1.1 Life Cycle and Project Management7.1.2 Requirements Engineering
7.1.3 Architecture and Design7.1.4 Implementation and Integration7.2 Debugging
7.2.1 Observing the System7.2.2 Software Side Debugging7.2.3 Hardware Side Debugging7.3 System Simulation/Emulation and Te sting
7.3.1 System Simulation7.3.2 Subs, Mocks and other Integrated Simulations7.3.3 Device Simulation
Trang 4A Appendix
A.1 Replacement Techniques
A.1.1 Control by Configuration ManagementA.1.2 Code Changes on Calling Side
A.1.3 Binding Time Selections
A.1.4 Dynamic Branching during Runtime
Trang 5What is Hardware-dependent Software (HdS) in comparison to ”normal software”? What is thedifference? What is specific to a device driver or a Hardware Abstraction Layer (HAL)? First ofall, the term ”normal software” is already hard to define which makes the comparison moredifficult Within this book, Hardware-dependent Software (HdS) is seen as a Software Systemwhich operates in close relationship with digital hardware, often called peripheral device orsimply device HdS interacts with the device, controls it, steers it Furthermore, it extends thehardware functionality, restricts it, and hides away or encapsulates problems of thehardware/software interaction
Functionality in software components and on the opposite in hardware operate in their ownexecution domains HdS interacts with those execution domains in hardware The executiondomains need to be connected and synchronised, which is in the responsibility of HdS Thissynchronisation is a mayor topic, so this book will focus on it
HdS is more than a Hardware Abstraction Layer (HAL) that just provides a first commoninterface of the underlying hardware platform A common interface as a low-level abstractionfocusing on portability on lowest level or closest level to the hardware
Figure 1.1.: HdS as part of the system, without HAL or as dominating part
Within the microcontroller domain, a whole application can be interpreted as HdS, as in mostcases the software just coordinates the interaction of the peripheral devices Or the control logic
of the application is a fraction in comparison to the part that can be interpreted as HdS Withinthat domain, HdS can additionally be seen as a library for the development of embeddedapplications So the HdS is here the basement for higher-level applications It provides, as acomponent, services similar to a software library but including hardware interaction (see Figure1.2)
Regarding the functionality provided, HdS and so called device driver are closely related To
distinguish HdS from device driver, a device driver is here a hardware-dependent software
component that is integrated or operates in the context of an operating system (OS) This needsextra considerations on the design, as the OS on the one hand restricts interfaces and
Trang 6Nevertheless it is a smooth transition HdS ranges from a tiny HAL, via more complexmodules, device drivers, up to nearly full applications All are closely related or interact withhardware devices This interaction leads to typical structures and makes the software design often
Trang 7look tricky On the other hand, the exploding demand of embedded systems with realtime,reliability, security and safety constraints make the design of HdS a crucial domain Within thisbook, interaction means are analysed and designs approaches discussed.
So the whole book is organised in a bottom-up approach All topics are discussed as general
as possible, to emphasis on principles and methodology, and not on ”and in case of the controllerx454B92A, you have to create a function that toggles bit 5 ” The later can be found in variousbooks on microcontroller programming, but without looking at the underlying principles andmethodologies
This book starts with a short discussion on the intention of HdS After that, a brief review oncomputer architecture principles follows to have a common understanding on the underlyinghardware architecture
HdS and hardware devices are parallel executing elements In parallel systems, thecommunication and synchronisation means are essential Hardware devices and software executeaccording to different paradigms Both sides need consideration to design software componentswhich aim to reach higher levels of abstraction
HdS can be designed with usual software engineering methods for parallel systems based onthe found communication and interaction model A software design is driven by different views
on the system The design of HdS from different perspectives is discussed based on the designand modelling with description languages like the Unified Modelling Language (UML)
Specific issues on device management, device drivers, and software development in thisspecial area are later on discussed in separate chapters
Trang 8Hardware-dependent Software is software that provides services in cooperation with hardware,the device Main objective is to make hardware-services available to other higher-level softwarecomponents Furthermore the Hardware-dependent Software may emulate missing functionality orrestricts the access to hardware functionality So the hardware and the environment it interactswith is protected against illegal operation requests Another objective is the management ofmultiple devices as hot-plugging techniques and power-management let the devices disappearduring runtime This chapter briefly introduces the objectives in order to give an overview werethe other analysis and design steps of the following chapters are aiming for
For now lets focus on the nature of Hardware-dependent Software (HdS) for a moment Wewill return to the hardware part in the section on computer architecture again In this section, thevarious objectives of a HdS shall be introduced to give a first inside into the complexity of HdS-design within this domain
Hardware-dependent Software is often denoted as low1-level software This is true, as itoperates close to the hardware of the execution platform and its peripheral components, simplydenoted as devices In embedded systems, the focus of the system is on the interaction with aphysical environment via those peripheral components/devices Hence, the majority of softwarecomponents are related to hardware Furthermore, operating systems can be seen as a collection
of device drivers which are in terms of this book Hardware-dependent Software componentswithin an operating system environment
Figure 2.1.: Overwiew of the interaction
First of all, HdS makes the hardware functionality available to higher level software At thebeginning with the initialisation and configuration of the device for the expected operations, laterwith interaction functions The availability of device functionality is often crucial for thecommercial success of the device vendor If the functionality isn’t available with the requiredperformance and quality by the driver (HdS), the device cannot demonstrate its full potential Onekey competence is to reveal this potential and grand interoperation with it
The design of the HdS determines which part of the hardware is available via the interfacetowards higher software functionality, here denoted as Application Which level of abstraction isreached at the application interface? Furthermore, as the hardware may interact with other
Trang 9hardware components or the outside world, does the HdS provide appropriate access to them?These questions must be determined by the intended use and influence the design For one kind ofapplications, a very low-level channel to parts of the hardware is required, for otherapplications, a very high-level view is needed.
Figure 2.2.: HdS as adaptor layer between application and hardware
So in some cases, the HdS allows transparent access to parts of the device and at the same time
a transparent access to devices beyond the own device with a high level of abstraction However,HdS is the adaptor between the needs of an application and the provided capabilities of theperipheral device and the associated end point(s) inside the device core
Figure 2.3.: HdS accesses different endpoints in device core
Software on CPU and hardware devices execute in parallel, whereas the digital circuit in thedevice is a massive parallel system itself The functionality of the software side and on thehardware side needs to synchronise at specific points Devices provide a wide range of
Trang 10synchronisation means which must be understood and the counterpart in software needs to berealised.
Hardware devices and HdS in combination provide a service as a whole So functionality can
be either implemented in hardware or in software (compare the HW/SW-Co-Design approach).Thus HdS not only makes hardware functions available, it can extend them with additionalfunctionality (compare a in Figure 2.4) The functionality is emulated in software by usingresources of the target platform, like the main CPU and main memory In extreme cases, the HdSprovides the service just with the later ones and without any real existing device A good example
is the RAM-disk It behaves like a hard-disk but is emulated with the CPU and the main memory
by the RAM-disk driver Another mid-level approach is to emulate a behaviour at the applicationinterface with the support of hardware of different kind (compare Figure 2.4 b)) For example, aCD-ROM is emulated with a hard-disk Read access to the disk image is supported, but write isnot allowed (or just only for initialisation) in order to fulfil the semantic of a write-only device.The partial emulation of devices implies another approach, that not all functionality of a device
is accessible and usable via the HdS (compare additional hardware-ports in Figure 2.4 c)) Thenon-availability restricts the usage of the device This is often the case, where hardware devicesare designed for a superset of possible use-cases For marketing or production reasons, thecombination of specific HdS, mostly in the sense of device driver, and the common devicedetermine as a whole the set of available services The variation of the software determines thefunctionality variant In summary, HdS provides a restricted access, enables the full potential of adevice, or extends it by emulation of additional functionality
Trang 11Some devices are very similar, especially if they are from the same vendor Hence, in manycases HdS must handle a set of similar devices The software part is common to the devices andthe whole functionality variant is here determined by the device variant In all cases, to adapt tothe given hardware, it is essential to clearly identify the hardware device in type and revision.And the hardware design has to provide means to identify those variations of the device core
An additional objective of HdS is to hide away little variations in the hardware or, better tosay, hide away revision variations A new revision of the device removes defects and theprevious software work-arounds can be removed or disabled Vice versa, new defects may be
Trang 12Along with the management of variations of single devices, a computer system may consists of
a set of devices of the same kind The HdS manages multiple devices of similar functionality incombination with multiple using applications (compare Figure 2.5) So device functionally of aspecific one has to be linked to a specific application The device management includes managingdevices which are only temporary available (hot-plugging) or which allow powersaving sleepmodes that makes them temporarily disappear
Figure 2.5.: HdS adapts between Applications and Devices
Furthermore, HdS has to protect the hardware and the surrounding system, either in software or
in hardware Obvious, the HdS must not allow that the device hardware is damaged or destroyed
by illegal operations of the application The commands and requested actions needs to be filteredaccording to the current system state and maybe according to the requester of the action HdSoperates in combination with hardware that often controls physical systems that do harm to theoutside environment So in the sense of safety, it must not be origin of any hazard, especially if it
is part of an safety critical system, for instance in medical devices or avionic control systems2.And, last but not least, HdS has to be very stable, reliable, and of high software quality HdShas typically full access to the hardware without any limitations The HdS is not allowed, either
by intention or by failure, to access other hardware than the assigned hardware or to have anyside-effects on other hardware components or software components in the computer system.Device driver as part of an operating system are allowed to execute privileged commands, so ithas nearly all rights and hence again all responsibility So in the sense of security the driverdesign and implementation must not allow to compromise the security integrity of the operatingsystem Although safety and security are different aspects they interact and must be handledtogether in system design[4]
In conclusion, HdS is not simply an adaptor between the device and a high level perspective to
it Next to the interaction with complex hardware it might have to emulate hardware, add missingfeatures, protect the hardware and the system, mange resources and usage and makes everythingwork in a stable, reliable, safe and secure manner in a complex environment
All aspects need consideration during software development So software engineering methodshave to be applied as for ”normal” software This script will provide insights to what HdS is,how it interacts with hardware and what engineering methods can be applied
1Low in the sense of abstraction level or in the sense of layered software-architecture.
2Systems that have been driven to a fail-safe state must ignore all commands for normal operation until leaving this state.
Trang 13This chapter contains a brief review on computer architecture as it is assumed in this book.Nowadays computer architectures are more sophisticated in order to achieve more performance,the fundamentals discussed here are still valid
All those components interact with each other and influence the design of HdS Even as theyhave evolved to highly optimised complex systems, their base functionality has more or lessstayed the same
3.1 Central Processing Unit
The Central Processing Unit (CPU), the processing core, executes the commands of a program.Internal registers hold data that is processed by the internal processing units, like the ArithmeticLogical Unit (ALU), or other specialise units, like floating point units, units allowing vectoroperations or cryptographical operations The CPU transfers data via bus-systems from the mainmemory to the internal registers and vice versa (compare for instance [27]) The transfer isalways initiated by the CPU, CPU registers cannot be accessed from the outside, not even in
Trang 14multi-master bus-systems Hence the transfers from or to CPU registers are only initiated bysoftware running on the CPU.
The execution of the commands is from the programmers perspective strict sequential Data areprocessed one after the other and the result is stored after calculation completion Modern out-of-order-execution processors optimise the internal schedule of the commands to gain a hightutilisation of each processing unit The sequential semantic still holds, because the results are re-assembled as if the execution has been in-order This strict sequential order is not true for thesequence of bus-operations It might be different between in-order execution and out-of-orderexecution even if the same software is executed To explicitly avoid out-of-order transfers on thebus, which might violate the desired communication protocol to I/O, explicit commands, so calledmemory-barriers, control the (partial) order of bus-operations (comparer for a detail discussion
of race-conditions [23])
Figure 3.2.: Core and Cache structure
Nowadays CPUs consist of multiple cores Each core is a CPU itself, with own register set,Instruction Unit, ALU, etc Each core is capable of executing an independent sequential stream ofcommands All cores access the same main memory by a common bus system As each core isindependent, but shares same resources of the computer system, synchronisation on differentlevels is required to avoid performance loss and race-conditions
A fast memory close to the CPU registers, the cache, holds copies of main memory data toprovide a fast access on further read requests The cache size is limited by the core frequency andthe speed of light So a hierarchy of caches with different speed and size is used In a multicorearchitecture, the view to the main memory must be coherent on each core Cache coherentprotocols between the caches synchronise this view Depending on the cache architecture, thecoherence is realised on different levels However, if data must be read from a main memorylocation without using a cached value (or must be written to a memory location), the cache must
be configured to not cache this location or an explicit cache control command in software mustenforce the transfer This is often the case for the access of peripheral devices, as peripheraldevices can alter the content independent of the CPU Otherwise data is not really read fromperipheral systems, instead only out-dated data is read from the cache
The simple form of a CPU with just one processing core and only one bus-system with multipleattached peripheral devices and main memory is still found in many microcontrollersystems
3.2 Bus-System
Trang 15The bus-system connects the CPU with the main memory and other devices, the peripheraldevices, of the computer architecture In simple architectures, the bus is line-oriented, connectingeach device and memory with the CPU (see Fig 3.1) In modern computer architectures, multiplebus systems with maybe different technologies span a tree from the CPU toward main memory andthe peripheral devices The root of the tree is connected with a Processor Local Bus (PLB)connecting all cores or caches in a cache-coherent[48] manner (compare Fig 3.2 and Fig 3.3).
Figure 3.3.: Tree like structure of bus systems
Often, the communication link towards the main memory branches at an early stage of the bustopology, mostly direct at the root node, from the bus-systems towards the peripheral devices;they form individual trees
Additional bus-bridges connect the peripheral bus systems of different technology and protocolwithin the tree Those bridges are devices on their own and the leaves of the tree are peripheraldevices A peripheral device can again be the root node of a further network, for instance like thethe USB host-controller Examples for that kind of bus topology are the architectures of theembedded PowerPC®processors, for instance the PowerPC440 family[28]
The topology of the bus-system from cache down to the peripheral devices is hidden fromsoftware The CPU, respectively each core, sees the memory and all device in a flataddressspace Each peripheral device is embedded in the address-space of the processor asmemory mapped I/O The CPU is intended to perform transfers between memory and its internalregisters Hence, the root node of the tree understands memory transfers and hence each devicehas to act like memory
The bus-system performs data transfers with different technology and protocols Each attachedcomponent must understand and follow the protocol If the interface of a digital component doesnot comply with the protocol, an adaptor component adapts to the protocol In case of peripheraldevices, the register file adapts the device core to the bus-system, its protocol and the scatteredaccess in time
Trang 16So from software point of view, memory and devices are accessed in the same way On thefirst glance, everything locks and feels like memory As we see later, the interface for peripheraldevices, the register file, just looks like memory from the perspective of the bus protocol Thereal behaviour in comparison to memory cells can be totally different
Nowadays bus-systems base on serial communication links instead of parallel lines Examplesare PCIe®[49], Hypertransport® [14] and QuickPath Interconnect[33] Nevertheless, theperspective from the CPU is still an access to a location in a memory address-space The transfer
is performed via a serial protocol instead of a parallel one
Figure 3.5.: Tree-structure of a PCIe system and mapping to a flat memory perspective
3.3 Peripheral Devices
Peripheral devices are digital devices as part of the computer architecture Attached as leaves ofthe bus topology, they are linked with the CPU and accessible from the software Peripheraldevices provide additional computational functionality and/or connect the computer with theoutside world
Trang 17The outside world is either digital again or analogous The devices need to interface to theanalogous signals in value and time Analog-to-Digital Converter (ADC) translate those signals todigital signals and Digital-to-Analog Converter (DAC) back again Even with those converters,the outside world operates with physical time instead of a discrete time that can be stopped atwill This unstoppable time requires special multi-paradigm debugging techniques, as discussedlater in Sec 7.2
The inner structure of peripheral devices can be separated into the device core and the RegisterFile (RF) The device core provides, as a processing element implemented in digital hardware,the device functionality The register file and its attached bus-adaptor interfaces the digital input-output lines of the device core and the attached bus-system
3.3 Peripheral Devices
Figure 3.7.: Register file as adaptor between bus and device core
3.3.1 Device Core
The device core realises the major functionality of the peripheral device It can be substructuredinto a mesh of interconnected digital components providing the functionality The real internalstructure of the device core is in most cases unknown From the outside it is a black box One canneither know whether it is pure hardware with state machines nor whether it is a completemicrocontroller with software (referred to as firmware) nor another sophisticated architecture,for instance Graphical Processing Units (GPU) Even the interconnection between the components
Trang 18Figure 3.8.: Mesh of internal components, linked directly or via bus system
However, the device core is a processing element realised with digital hardware executing
massively parallel Combinational logic, realised with logical gates, processes digital encodedvalues on digital signal lines Computation speed is limited by the delay of the logical gates andinterconnection links of the given technology
In combination with clocked registers, sequential logic can be build On the first level tosynchronise results of the logical gate in a Register Transfer Level (RTL) design, over pipelinearchitectures, up to state machines realising sequential control They form the control path of adigital circuit The synchronisation of the registers with a clock signal allows a step-wiseexecution or control
The complexity of device circuits can reach the level of processor architectures that executefirmware An important example are General Purpose Graphical Processing Units (GPGPUs).The real physical architecture is unimportant; from the software perspective executing on the mainCPU, those complex devices are still peripheral devices (compare Fig 3.9) that have a dedicatedjob They operate fast, optimised, and nearly independent The fundamental communicationschema is still a memory access in the address range of the device
Trang 19Figure 3.9.: GPGPU as device in the CPU perspective (according to [ 39 ])
In conclusion, a device is a mesh of encapsulated functional behaviours realised in digitalhardware that interacts internally in full parallel The execution and internal communication isoften synchronised by a common clock signal Internal components communicate with the externalworld by way of adaptors, that transform the signals in value and time domain Communicationwith the computer components is realised via the Register File and its busadaptor How many andwhich device’s components are visible via the Register File depends on the device’s design
The two domains, bus and device core, have different behaviour in the time domain Data aretransferred by the bus-system from time to time Even with a stream of data, only sections in thesize of some bytes change in the register file In each bus cycle, only the number of bit in the size
of the data bus width can change in the register file On the device side, each register bit isattached to the device core via a dedicated signal line Consequently, each register can be read atany time in parallel and can be changed at any time as well The access is massive parallel andonly restricted in time by the device clock1
So the register file is a storage for signals from or to the device core Signal values from thedevice core are buffered and can be read from the bus side as if they are values in memory cells.Vice versa, values transferred to the register file are stored and applied as logical signals to thedevice core As the register file is digital hardware and only has to look like memory bycomplying with the bus protocol, some registers implement additional behaviour
Thus, the entire communication between the device core and software executing on the CPU iscarried out via the register file The behaviour of this adaptor with its additional behaviour has to
be taken into consideration whilst designing Hardware-dependent Software Sometimes thesoftware designer gets the feeling to operate through a key-hole
Trang 20Figure 3.11.: Connection of peripheral bus and device internal bus via bus-bridge
A rare alternative communication mean is the linkage of the bus-system with a device internalbus-system via a device internal bus bridge to gain access to an internal shared memory (see Fig.3.11) The internal memory is mapped into the memory space of the computer system GPU-devices uses this technique for fast data transfer between their main memory and the main memory
of the computer system utilising direct-memory-access (DMA) controller
3.3.3 Device and Computer System Structure
The real internal structure of a device is often unknown A functional reconstruction is feasible byanalysing the device’s manual Or at least developers derive an equivalent behaviour from thedescription in the manuals2
Trang 21be determined
Nevertheless, a device is a mesh of functional components The functionality of eachcomponent of the mesh is encapsulated and either directly accessible or only accessible via othercomponents (compare Fig 3.12) So from the perspective of the HdS, sometimes key-holeoperations need to be performed to steer one component via another component, because it is notdirectly reachable
Figure 3.12.: Hidden components in the device core without direct connection to the register file
Device hardware is normally fixed in its structure The chip has a fixed structure and hence theimplemented resources are fixed A dynamic growing or resource allocation is typically not thecase There are two exceptions where the hardware structure is not fixed The first one is hot-plugging, where a whole device can be added to a bus-system or can be removed The computersystem is able to expand or to shrink concerning its device resources in the granularity of fulldevices Modern architectures allow to control the powering of individual devices The effect ofturning off the device power is the same as removing it in the perspective of the software Thesecond exception are reconfigurable devices Those systems base programmable logic, a modernFPGA-architecture that allows to modify the interconnection and the functionality of logical gatesand hence the provided functionality of the whole chip during runtime Some architectures evenallow a partial change of dedicated areas, a partial reconfiguration, during runtime Within thelimits of the FPGA resources, the hardware device is able to expand, to shrink, or to changebehaviour at runtime
Within the device, different clock domains may exist Typically, all components are clocked bythe same clock line Preserving energy, or for other design reasons, components inside the chipcan operate with different clock signals, mostly derived from the same master clock.Synchronisation means on the communication links must ensure the data integrity while crossingthe domains
Within the computer system, each processing element has its own execution pace: the softwareexecuting on the CPU and the functionality implemented in the device Both domains operateasynchronously and the processing domains are only coupled by the register file Sequentialexecuting software on the CPU copies data from memory to the register file in chunks whereas the
Trang 22device core has full parallel access to the register file content Additionally both processingelements operate with a clocked time whereas the outside world operates with physical time.Finally, we have to discriminate the execution paradigms Hardware is executed massiveparallel and software strict sequential Again, both paradigms meet at the register file.
3.4 Conclusion
Figure 3.13.: General interconnection structure of processing elements
A CPU is a set of sequentially executing Processing Elements (PE), Cores, connected with themain memory and a set of massively parallel executing processing elements, the peripheraldevices, connected via a tree-like connection system, the bus-system (compare Fig 3.13) Thesoftware executed on the Cores controls the data transfers between internal registers and the mainmemory or the devices’ register file The devices communicate with their register file in fullparallel
HdS executes on the Cores and communicates with the associated devices Both execute in fullparallel So key questions for the design of HdS are the communication and thecoordination/synchronisation The register file in the devices plays an important role here.Furthermore, the influence of the opponent execution paradigms needs consideration
The next chapter discusses principles of synchronisation and communication via the registerfile With this foundations, the lowest levels of the HdS are designed As the CPU initiates nearlyall communication, communication initiated by the device needs extra consideration With a twoway communication and synchronisation, the design of HdS can begin
1The access frequency is actually only limited by the used technology, but access faster than device clock doesn’t make sense in
RTL-designs.
Trang 232 Most developer wouldn’t say, that they create any model; nevertheless, it is not explicit but in their minds is something like a
behavioural model Otherwise, they would not be able to design any software for it.
Trang 24As indicated in the previous chapter, a computer architecture can be reduced to a system of fourcomponents, the CPU, the bus-system, the memory, and peripheral devices (register file anddevice core) Functionality is executed on the CPU and the device cores HdS executes as part ofthe functionality on the CPU Thus it is a parallel computing system with two kinds of executionunits, processor cores and digital circuits1
Crucial aspects of parallel systems are the communication and the synchronisation of theparallel executing components The opposed execution paradigms as well as the bus-system andthe register file behaviour, as element of the communication link, have an impact oncommunication and synchronisation So the design of HdS starts with the communication betweenthose parallel components, respectively the adapting register file for each side Each side has avery different processing architecture, that leads to asymmetric handling of communication andsynchronisation Based on the execution model, the communication model, and the register filebehaviour, the synchronisation means are analysed and implications on the software design arediscussed in here
Figure 4.1.: Functionality determines HW, SW and Communication in Hardware/Software Codesign whereas the HW in
combination with Communication determines the HdS-Design and hence Functionality
It is assumed, that the peripheral device is set and cannot be changed in its behaviour at theinterface Even with Hardware/Software Codesign methodology, it is still widely-establishedpractice, that a peripheral device is designed first and than integrated into a computer system or amicrocontroller system as System-on-a-Chip (SoC)
The section on communication starts with an analysis of the register file nature Theinformation exchange on lowest level will lead to a layered architecture on the software side thathandles the encoding and transfer General implementation issues will be discussed as well
Trang 25The section on synchronisation starts with an overview on synchronisation principles With thehelp of these principles, the somehow strange behaviour of some register file implementationswill be explained Additionally, modern CPU architectures have a strange communicationbehaviour as well (out-of-order), that has an impact on synchronisation and will be explained aswell As a result, a layering architecture for the HdS is derived The layers provide a moreabstract interface respectively interfaces with more semantic meaning.
The low-level synchronisation will be extended to the exchange of message and functionalitysynchronisation Here the two concepts of polling and message handling by Interrupt ServiceRoutines will be discussed including communication between the two flows of control
The communication, low-level synchronisation, and the functional synchronisation are thefoundations for the design of HdS Based on the findings in this chapter, the design of the higherfunctionality, the counterpart of the device core functionality, is described in the followingchapter
4.1 Communication
Parallel executing functionality is never totally decoupled and those parallel threads need tocommunicate The communication depends on the processing elements they are mapped to and thecommunication link in-between HdS is mapped to the CPU and the associated functionality ismapped to the device core of the peripheral device
Crucial component in the communication between the software on the CPU and the device core
is the register file interfacing the two execution and communication paradigms on each side Thetypical register file design in combination with the connection to a bus-system leads to a layeredarchitecture to transparently hide the communication on lowest level
4.1.1 Nature of the Register File
Inside the peripheral device, the register file (de-)couples the systems of opposed executionparadigms, the massive parallel hardware (device core) and the strict sequential software (HdS).Major objective is handling of the asynchronous information exchange The software side of theregister file is attached to the computer peripheral bus system, the other side is directly attached
to the device core
Figure 4.2.: The Register File as coupler between device core and the peripheral bus towards the CPU
Trang 26of the byte-oriented transfer from the CPU are stored for further processing by the device Viceversa, values from the device core are buffered until read from the software side
On the software side, the register file has to behave like byte-oriented memory, as we assumethat the CPU does not distinguish between addresses in memory-address space and addresseswhere peripheral devices are located Therefore, it is called ”memory mapped I/O”, as the samemachine instructions in the CPU are used to access memory as well as register files
case, the offset is multiplied with a factor, typically 2 or 4
Trang 27On the hardware side of the register file, the device core is directly attached with its inandoutput signal lines The device core consists of state machines in the control paths and logicaloperations in the data paths The state machines process input symbols and present as resultoutput symbols All symbols are binary coded The logical paths have an arbitrary number ofinput and output signals All input/output-symbols and the input/output data are represented bygroups of bit and mapped to bit-fields in the registers A bit-field contains at minimum one signal,
a bit
The assignment of positions for the bit-fields in the register file is up to the register filedeveloper, as in principle each signal from the device core can be routed to any register.Algorithmic assignment of signals to registers is possible as well and shown for instance in [76].The algorithm groups read-only, write only and read-write registers and avoids automaticallysplitting of bit-files over multiple offsets
Figure 4.5.: Mapping of input/output signals to bit-fields in register
Trang 28The device core has a full parallel access to each register cell of the register file at any time Itcan change the whole register file within one clock cycle or all components of the device corecan react in parallel to a change of a single register.
The register file is, as part of the peripheral device, a digital circuit with enhancedfunctionality in comparison to memory Especially, the access to a register from software side(bus side) is recognised as the protocol state machine handling of the peripheral bus is part of theperipheral device (compare Fig 3.7) So the device core is able to react on a register read or canreact on writing to a register even if the content of the register does not change This accessobservation is not feasible with parallel systems operating on shared memory, neither on dual-ported shared memory The device core is able to observe the access from software side and cantake automatic actions For instance, FIFO-buffers can shift the contents after the head value hasbeen read by software without any additional command from software side In general, thisobservation feature is used for implicit synchronisation, as we see later on in Sec 4.2.2
Other functionality is located in the register file as well For instance, if state machines of thedevice core issue a transient signal, which for instance indicates the passing of a dedicated state,that signal can be buffered until it is read from the software side For conceptual clarity, thisbehaviour can be allocated to a tiny state machine of the device core, whereas from the practicalperspective, the device’s developer would implement the functionality in the register file (flip-flop) Additional special behaviour will be discussed later in Sec 4.2.2
In conclusion, the communication of the device core and the ”shared memory” register file issimple, as state machines and logical operators exchange symbols and arbitrary data directly overdedicated lines The access is fully parallel whereas the access on the software side to theregister file is time-multiplexed and limited to chunks in the width of the data bus Additionally,the hardware side can observe the access to registers which provides implicit synchronisationmeans, as discussed later on in Sec 4.2 on synchronisation
4.1.2 Communication on Software Side
The communication on the device core side of the register file is simple, as the access is directand full parallel Furthermore, it is predefined as in our assumption the whole peripheral device
is pre-determined
On the software side, the HdS executes on a CPU utilising processing resources and localmemory in form of CPU-registers As the CPU-registers are limited in size and number, data areexported to and imported from the main memory (Zuse-Architecture) As the register file operateslike a shared memory between the device core and the HdS, the content of the register file iscopied for processing to the CPU-registers as well (compare for instance [18]) The interface ofthe device core is a set of signals grouped into bit-fields mapped to the register file as described
in the previous section Hence, the organisation of this shared memory is predefined Tocommunicate with the device core, theses signals need to be isolated on the software side, toallow a transparent communication with grouped signals in the interface of the device core
Trang 29Figure 4.7.: Adaptor to gain a transparent link to device-core interface
Software engineering suggests to organise software in layers to separate concerns Ascommunication is encoding and transfer of messages, we will introduce two small layers: one forthe bit-correct transfer between register file registers and CPU-registers, the transfer layer Thedata transfer is in size of bytes whereas the signals are organised in bit-fields So a second layer,the Manipulation Layer, on top for the isolation of bit-fields respectively manipulation of bit-fields and hence the input signals to the device core This manipulation layer provides theencoding and decoding Later on, an additional layer will provide a more semantic meaningfulinterface that will include synchronisation handling (later in Sec 4.2.5)
Both layers are introduced in the following sections Afterwards the implementation in reallayers is discussed In combination, both layer form an interface for transparent communicationwith the signals of the device core The data transfer with the help of Direct Memory AccessController (DMA-Controller) will not be discussed here
Trang 30The Transfer Layer
The transfer layer shall handle the data exchange between the Register File (RF) and the softwareexecuted on the CPU The task of the transfer layer sounds simple: (atomically) copy databetween register file registers and CPU-registers This is only simple on simple communicationpaths between the CPU and the device, for example in simple microcontroller systems If thesystem is more complex, the transfer layer might need the help of other software components toestablish a link to the peripheral device The layering concept shall hide those additionalsoftware functionality towards the upper layer
Figure 4.9.: Transfer operation (transfer mapping) reverses register mapping to address space
In the simple case, this layer is just a set of read and write operations to access the registers ofthe register file The target registers are identified by register offsets, typically abstracted in theprogramming language by symbolic names The layer performs the needed address translations toaccess the memory mapped register file, to reverse the mapping of registers to memory addresses(see Fig 4.9 and Sec 4.1.1) The needed parameters for the address translation are determinedwithin the initialisation phase of the layer or during the development phase The transfer isencapsulated for instance into C-functions where the calculations are done and the memory isaccessed via pointer operations The transferred data are passed by function parameters to thelayer respectively provided as return values
Non-fitting Bus Sizes In some cases, the width of the registers in the register file does not fit to
the width of the data bus size For instance, a register with width of 8 bit, designed for a 8 bit bus,
is connected to a 16 bit width data-bus
One solution for attaching the register file to the bus-system is to use only the lower 8 bits ofthe 16 bit data bus The registers of the register file appear now only at even addresses (compare
Fig 4.10) Nothing is connected to the odd addresses The HdS must correct the register offsetsduring translation to memory addresses and vice versa filter the relevant bits out of the data word,
in case an access with fitting data type is not supported This leads to gaps in the memory map ofthe register file
Trang 31A rational behind this design might be, that an IP for the device including the register file hasbeen designed for a data bus with a different width It looks easier from hardware side, to simplyattach the device to the non-fitting bus instead of designing a fitting register file The downsideare strange address definitions, address calculations and maybe performance loss on the softwareside
Register Mirrors Establishing a one-to-one mapping between a register and memory address
requires address-decoding in the full size of the address bus Sometimes, the address bus ispartially encoded only This results in a n-to-1 mapping between addresses and register.Depending in the decoding, the registers appear as mirrors at consecutive addresses (compare
Fig 4.12) or at addresses with a distance of 2n (compare Fig 4.11)
The rational of the hardware developer is that logic gates are spared for the reason of money orperformance, because each gate increases the decoding delay The result is only a partialevaluation of the mapping and hence the register is accessible at more than one address
to-one This is an hardware design issue and hence should have no influence on HdS
For the software, this behaviour has only the drawback, that the address translation is not one-Atomic Transfers It is assumed, that a bit-field is read or written from bus side as a whole in one
go, more precisely within one clock cycle Otherwise synchronisation means might fail or have to
be explicitly provided for non-atomic transfers A counter example is shown in Fig 4.13, a statemachine consumes symbols given respectively store in a register The next symbol will be BC tosteer the state machine into state W In this example the transfer is distributed over two clockcycles so that the register temporarily contains the new upper half in combination with the oldlower half, as a part of an input-symbol is postponed The state machine evaluates a (invalid)symbol before the symbol BC is valid as a whole in the next cycle As a result, the state machine
is in a valid, but wrong state Z instead of W In other cases, the state machine might be in aninvalid pseudo-state, that doesn’t exist by design but due to the digital coding of states in thehardware
Trang 32The CPU executes the transfer operation as an atomic machine instruction, that cannot beinterrupted Even if the bus-protocol allows to transfer data in multiple chunks, for instance a 32
Trang 33bit access on a 16 bit data bus in two packets, the transfer is still atomic from the CPUperspective Obviously, the later kind is non-atomic in the perspective of the register file as itexperiences two accesses (compare with problem discussed above).
Another kind of access that leads to atomic transfers regarding the register file is a aligned access For instance, if a 16 bit access operation on a 16 bit data bus starts at an oddaddress Than two transfers are performed, one affecting just the higher part of the data bus and asecond affecting only the lower part of the data bus (or in opposite order), but both with evenaddresses (see Fig 4.14) Those non-aligned operations are atomic in the perspective of the CPU,but again two atomic accesses in perspective of the register file Some processors do not allownon-aligned transfers
non-Figure 4.14.: Non-Aligned access leads to two transfers
As a consequence, the device/register file designer tries to avoid spreading of bit-fields overmore than on register to prevent non-atomic transfers or to avoid the implementation of explicitsynchronisation means On the other side, the software designer needs to be aware of those non-atomic transfers and has to avoid them as no synchronisation means are provided in those cases
A more detailed discussion on the rearrangement of bit due to bus-systems and the automaticcode generation based on architecture models can be found in [37]
Implementation Hints The interface of the transfer layer towards the upper layer consists of the
transfer direction, the target register, and the read, respectively the to be written, register value.Inside the transfer layer, the needed address resolutions are performed and the value is copied to
or from the device register If no other helper component is involved, this layer has no interface tolower layers
The simplest realisations and most low-level of the transfer operations are instructions for copying data between memory and CPU-registers The assembler instructions areembedded as assembler macro for the target machine in a higher programming language like C.For most systems, the performance of a C-compilers is good enough to have inlined C-functionsfor the transfers that use pointer-arithmetic and -operations inside (compare code in Fig 4.16).Major problem here is to determine the right data type on C-level for the access to the deviceregisters with the correct endian The interface to the upper layer is usually a function-interface.The location of a device signal in the register file is exactly determined Where this signal islocated in the CPU-register after it is copied to that register depends on the computer architectureand the bus topology For example, the byte order differs on little-endian and big-endianarchitectures Hence, the mask for the bit-manipulation differs for each kind
Trang 34assembler-Normal memory transactions don’t care on the endianness as long it is the same for read andwrite and the data is not altered in the memory For data copied from the CPU to the device andread back to the CPU, the mapping between position in register file and CPU-register needs to beknown For storage in memory, the mapping is irrelevant2 And the memory cells do not alter thevalue So register files with memory cell like behaviour do not alter the stored value, but themapping of each register cell to bit in CPU-register needs to be known.
Figure 4.15.: Data transfer from memory to CPU-register in little and big endian byte order
The register addresses are predefined constants or at minimum the offsets are predefined andthe effective address is calculated within the layer The address translation depends on thecomputer architecture, the mapping of registers to addresses
// read value of register (with already correct endian) reg value = *( device_base_address+register_offset );
Figure 4.16.: Address and register offset form the effective target address for the read access
Instead of utilising explicit pointer-arithmetic, for some target systems a C-data structuredefines the offsets within the register file (compare code in Fig 4.18) This incorporatesknowledge on the target data type sizes, the connection of the register file to the bus (mapping)and the behaviour of the compiler (packing and placing of data structures to memory addresses).The structure is located in memory by its base address that is aligned with the device baseaddress The name in the data structure than determines the effective target address The C-language further allows the definition of bit-fields within the data structure to get below the size
of bytes
Trang 35This technique of utilising the data structure mapping to memory locations is not applicable forthe layering concept No function interface can be design that the register of interest can be past asparameter Only if the transfer layer is merged with the manipulation layer and later the semanticlayer, data structures can be utilised Still, this programming method is often applied in the field
of microcontroller, because the layering concept is not applied and therefore the devicecommunication is unfortunately entangled in the functional code
is attached via a bus technology that doesn’t allow a transparent mapping of device register tomemory locations Access to device registers requires a translation to messages3 by the USBdevice driver (compare Fig 4.19) The USB device driver needs to set-up the USB host-controller and provides means for communication The transfer layer uses services of USBdevice driver Again, the translation is now not only from simple register address to memoryaddress, now the device needs to be found on the USB bus first and more complex procedure toset-up the communication link is needed
Trang 36With object oriented thinking, the transfer layer is realised as a dedicated class that hides awaythe communication channel to the device adaptor The channel is established by the constructorand the access functions are methods of this class Even with a complex path to the device, thisclass abstracts from the real communication link and covers all translation, for instance theendian, as an adaptor class
Figure 4.20.: Class uses transfer layer defined by interface
The transfer layer class implements a common interface that represents the upper layerinterface So the upper layer operates on this interface The real implementation of the transferlayer, and hence the real communication link can be established either during development time orduring runtime by incarnation of the appropriate object In the later case, the connection with theappropriate transfer mechanism is best established by means of the Dependency Injection Pattern
The Manipulation Layer
Information is encoded and packed in bit-fields in the registers of the register file Theinformation must be isolated from neighbouring bit-fields, respectively changed without affectingneighbouring bit-fields, of the same register as data bus transfers are in the minimum size of abyte not bits The transfer between CPU-register and registers of the register file is performed byservices of the transfer layer, but still in chunks of bytes The required bit-field operations arenow conceptually located in this manipulation layer
Trang 37Information in the bit-field ranges from single bit, enumerations encoded in bit-pattern, valuesencoded in bit-sets or arbitrary pattern each with a dedicated meaning Typically, the bit-fieldsare not spread over more than one register, as otherwise extra synchronisation means need to beimplemented on both sides of the register file for atomic transfers (see 4.1.2) So the registercontent can be transferred4 to the CPU-register, where filter operations separate the bit-field ofinterest from the rest of the register contents
For evaluation of a bit-field, the first operation is to isolate the bit-field from the wholeregister content The bits are isolated (masking) with a binary AND-operation where the secondoperand is a pattern (mask) with bits set at the position of the bit-field (compare Fig 4.22) Theisolated content is maybe move it to a right aligned position within the CPU-register, so that themost right bit of the bit-field falls upon the LSB This eases the handling from the perspective ofthe software developer as the content is interpreted in the debugger as number, and small numbersare easier to ”read” and re-translated to bit-pattern In some cases, the shift-operation isirrelevant, as the shift and comparison of the isolated pattern in ((x AND 11100000) >> 5)!= 0equals (x AND 11100000)!= 0
Manipulation of a bit-field in a register file register often requires a read-modify-writeoperation The register content is transferred to the CPU and logical operations are applied in thearithmetic logical unit (ALU) to modify the bits Bits are cleared by binary AND-operation, set bythe binary OR-operation and inverted by the binary XOR-operation The result is written back tothe register All operations require a dedicated bit-pattern For instance, to set a new value of abit-field, the field is cleared by an AND-operation, the new value is shifted to the correct bit-position and integrated in the bit-field with the binary OR-operation
Trang 38Some register files allow, as special behaviour (others are discussed in Sec 4.2.2), themanipulation of bit-fields without read-modify-write cycles The operations for clearing singlebits or setting single bits of a associated register are implemented in hardware (set/clearregister,compare illustration in Fig 4.24) So the register content can be manipulated by writing patterns
to one of three different locations The first location copies the given pattern as is to the register,same as the usual memory like behaviour The second location sets each corresponding bit of theregister that is set in the pattern And the third location clears each corresponding bit of theregister that is set in the given pattern The later two register file behaviours increase theperformance, as no read-modify-write operation is executed, instead just a write operation isexecuted (compare Sec 4.2.2)
The manipulation layer just provides the services for the bit manipulations It does not
Trang 39provided higher semantic meaning or interfaces to the developer A semantic interface shall beprovided by the next higher layer As this higher layer just uses bit-manipulation services incombination with the transfer layer, the manipulation layer somehow vanishes and becomes just aservice component This matter is later discussed in Sec 4.2.5, as the semantic layer has toprovide synchronisation means as well.
Figure 4.24.: Principle of the Set- and Clear-registers
Implementation Issues
Typical implementations in hardware near programming languages of both layers do often notseparate between both layers Typically the transfer layer is realised as a set of pointeroperations, especially in the context of microcontrollers The bit-manipulation is performeddirectly by logical AND and OR operations in line with the pointer operations Sometimes, themanipulation is separated in the form of preprocessor macros Nevertheless, a realisation of thetransfer layer without strict separation with an appropriate interface makes it difficult to createportable software, that is able to use different transfer channels An additional problem is theusage of stubs or the redirection of the transfers for debugging/logging, if the transfer layer isentangled in the application code
In an environment, where the HdS is created for a closed system or system family, only the topinterface for both layers should be explicit As discussed later, it forms a basement for thesemantic layer
4.2 Synchronisation
The previous chapter has discussed the binary coded exchange of information between HdS andthe device core via the register file Both sides can independently access the register file and arehenceforth a non-synchronised parallel system On the software side, data exchange and execution
is sequential, on the hardware side full parallel Explicit synchronisation means are needed fordata transfer, to adjust functionality, and to synchronise both side’s interactions
Trang 40First, general synchronisation principles are shown Along with the general interaction, theusage of the synchronisation in our context is depicted In comparison to data transfersynchronisation, some functional adjustments are protected by the interaction order instead ofexplicit synchronisation means The differentiation of state machine trigger leads to a differenthandling on the software side Furthermore, the realisation of the communication buffer as registerfile and not as simple shared memory allows special support for the synchronisation means Someimplementation examples are shown and the realised synchronisation kind, if any, is discussed.The synchronisation means should be encapsulated in a software layer which gives thetransferred bit-pattern a semantic meaning Data transfers need synchronisation, if they cannot behandled atomically by the transfer layer So these data sets are packed to messages Thesynchronisation of software functions with the hardware functions needs more detailed discussion
on polling and on interrupt means, and is separately discussed in its own section
4.2.1 Synchronisation Principles
In this section, synchronisation principles will be discussed first The principles can be used toindicate the readiness of data, respectively that data are valid, or signal the start or thecompletion of a task on either side Completion can although mean here, that a dedicatedexecution step has been completed, but processing goes on The principles are discussed onshared memory first The register file is a digital system, that only looks like shared memory tothe bus system, but extra features for improving the synchronisation can be implemented.Furthermore, the massive direct parallel access from the device core to the register file needsconsideration as well For each principle, next to the schema and its usage, is briefly discussed.All principles are explained using tasks and tokens for signalling The tasks can be eithersoftware tasks or functionality in hardware The token are represented by a single bit or bitpattern, that can be changed atomically in the register file
The Two Token Principle
The Principle The first synchronisation approach uses two tokens A and B to synchronise Task A
and Task B in Fig 4.26 The Task A indicates a synchronisation signal to Task B by setting thetoken TA at t1 Task B responds that it has recognised the token TA by setting the token T B (at t2).Task A acknowledges by clearing the token TA (at t3) and Task B acknowledges the clearing byclearing token TB as well at t4 (compare Fig 4.26)