This dissertation describes a simulated autonomous car capable of driving on urbanstyle roads. The system is built around TORCS, an open source racing car simulator. Two realtime solutions are implemented; a reactive prototype using a neural network and a more complex deliberative approach using a sense, plan, act architecture. The deliberative system uses vision data fused with simulated laser range data to reliably detect road markings. The detected road markings are then used to plan a parabolic path and compute a safe speed for the vehicle.
Trang 1A Simulated Autonomous Car
Iain David Graham Macdonald
Master of Science School of Informatics University of Edinburgh
2011
Trang 2Abstract
This dissertation describes a simulated autonomous car capable of driving on style roads The system is built around TORCS, an open source racing car simulator Two real-time solutions are implemented; a reactive prototype using a neural network and a more complex deliberative approach using a sense, plan, act architecture The deliberative system uses vision data fused with simulated laser range data to reliably detect road markings The detected road markings are then used to plan a parabolic path and compute a safe speed for the vehicle The vehicle uses a simulated global positioning/inertial measurement sensor to guide it along the desired path with the throttle, brakes, and steering being controlled using proportional controllers The vehicle is able to reliably navigate the test track maintaining a safe road position at speeds of up to 40km/h
Trang 3urban-Acknowledgements
I would like to thank all of the lectures who have taught me over the past year, each
of whom contributed to this thesis in some way Particular thanks must go to my supervisor, Prof Barbara Webb, for agreeing to supervise this project and for her advice and encouragement throughout, and to Prof Bob Fisher for his many useful suggestions
Trang 4Declaration
I declare that this thesis was composed by myself, that the work contained herein is
my own except where explicitly stated in the text, and that this work has not been submitted for any other degree or professional qualification except as specified
(Iain David Graham Macdonald)
Trang 5Table of Contents
CHAPTER 1 INTRODUCTION 1
1.1 P URPOSE 1
1.2 M OTIVATION 1
1.3 O BJECTIVES 2
1.4 D ISSERTATION O UTLINE 2
CHAPTER 2 BACKGROUND 3
2.1 I NTRODUCTION 3
2.1.1 Motivation 3
2.1.2 A Brief History of Autonomous Vehicles 4
2.2 T HE U RBAN C HALLENGE 6
2.2.1 The Challenge 6
2.3 B OSS 7
2.3.1 Route Planning 7
2.3.2 Intersection Handling 8
2.4 J UNIOR 9
2.4.1 Localisation 9
2.4.2 Obstacle Detection 10
2.5 O DIN 12
2.5.1 Path Planning 12
2.5.2 Architecture 12
2.6 D ISCUSSION 14
2.7 C ONCLUSION 15
CHAPTER 3 SIMULATION SYSTEM 17
3.1 A RCHITECTURE 17
3.2 T RACK S ELECTION 18
3.3 C AR S ELECTION 19
CHAPTER 4 REACTIVE PROTOTYPE 20
4.1.1 Image Processing 21
4.1.2 Training 22
4.1.3 Results 24
4.1.4 Evaluation 25
CHAPTER 5 DELIBERATIVE APPROACH 26
Trang 65.1 S UMMARY 26
5.2 G ROUND T RUTH D ATA 27
5.3 S ENSING 28
5.3.1 Some Initial Experiments 28
5.3.2 The MIT Approach 32
5.3.3 Road Geometry Modelling 43
5.3.4 Lane Marking Verification 48
5.3.5 Lane Marking Classification 49
5.4 P LANNING 52
5.4.1 Trajectory Calculation 52
5.4.2 Speed Selection 54
5.5 A CTING 54
5.5.1 Speed Control 55
5.5.2 Steering Control 56
CHAPTER 6 EVALUATION 57
6.1 L ANE M ARKING D ETECTION AND C LASSIFICATION 57
6.2 T RAJECTORY P LANNING 58
6.2.1 Generation of Trajectory Points 58
6.2.2 Flat Ground Assumption 59
6.2.3 Non-continuous Path 60
6.2.4 Look-ahead Distance 61
6.3 P HYSICAL P ERFORMANCE 62
6.3.1 Path Following 62
6.3.2 Speed in Bends 64
6.3.3 G Force Analysis 64
6.3.4 Maximum Speed 66
6.4 R EAL - TIME P ERFORMANCE 67
CHAPTER 7 CONCLUSION 69
7.1 S UMMARY 69
7.2 F UTURE W ORK AND C ONCLUSION 70
BIBLIOGRAPHY 73
Trang 7Chapter 1 Introduction
"The weak point of the modern car is the squidgy organic bit behind the
wheel." Jeremy Clarkson
1.1 Purpose
This dissertation describes a simulated autonomous car capable of navigating on urban-style roads at a variable speeds whilst staying in-lane The simulation uses TORCS, an open source racing car simulator which is known for its accurate vehicle dynamics [23] Two solutions to the problem are provided; firstly, a reactive approach using a neural network and secondly, a deliberative approach inspired by the recent DARPA Urban Challenge with separate sensing, planning, and control stages In particular, the latter system fuses vision and simulated laser range data to reliably detect road markings to guide the vehicle
1.2 Motivation
The recent DARPA Grand Challenges and the work of teams such as Tartan Racing [8] and Stanford Racing [9] have demonstrated that the goal of fully autonomous vehicles may be within reach The development of such vehicles has the potential to save many of the thousands of lives that are lost in collisions every year [1]
Due to the need for autonomous vehicles to interact in an environment populated by human drivers and pedestrians, there is clear safety risk in the development process This risk can act as a barrier to development as any autonomous vehicle must prove a
sufficient level of safety before it is able to enter such an environment One approach
to this problem, as seen in the DARPA challenges, is to create controlled environments that are representative of the intended operating environment Whilst the benefits of this approach are clear, the logistics and the expense of organising these events mean that they are likely to remain rare
Another approach is to simulate the environment thereby eliminating the safety risk and reducing costs Simulations have the additional benefit of being able to test performance under unusual circumstances and allow algorithms to be optimised and
Trang 81.3 Objectives
The goal of this project was to develop an simulated autonomous vehicle capable of driving on urban style roads The vehicle must be capable of navigating around a test
track in a safe and controlled manner Specifically, the vehicle must remain in the
correct lane and drive at an appropriate speed Although the environment is simulated, the intention is to approach the project as though the vehicle is real With that in mind, the vehicle should only make use of information that would be available
in the real-world, and the system must run in real-time The project looks to the recent DARPA Urban Challenge for inspiration
1.4 Dissertation Outline
The remainder of this document is structured as follows: Chapter 2 provides a brief history of autonomous vehicle research and discusses some of the techniques used by state of the art vehicles Chapter 3 describes the simulator and system architecture Chapter 4 describes a sub-project which was undertaken to establish the feasibility of the main project Here, a neural network is used to control a simulated autonomous vehicle Chapter 5 forms the main body of the dissertation and describes a deliberative approach using image processing, data fusion, planning, and control techniques to solve the problem Chapter 6 provides the results of experiments performed on the completed system as an evaluation Finally, Chapter 7 offers a summary of the work undertaken, conclusions, and suggestions for future work
Trang 9Chapter 2 Background
2.1 Introduction
This chapter examines the current state of the art in driverless cars It focuses on the
2007 DARPA Urban Challenge, a competition held to promote research in the field and the main inspiration behind this project The motivation behind autonomous vehicles is discussed, followed by a retrospective that places the Urban Challenge in context The challenges posed by the Urban Challenge are described and the vehicles
which finished in the top three positions, Boss, Junior, and Odin examined As this
project aims to build a complete autonomous system, different aspects of each vehicle are examined giving a broad overview of the field Although some of the techniques described do not relate directly to this project, they have helped shape it and represent the aspirations of the project had more time been available
The car has been a significant force for social change, improving the mobility of the population Access to this mobility will increase the quality of life for certain groups, such as the elderly and disabled, who cannot drive themselves For others, being
Trang 10Chapter 2 Background
released from time spent behind the wheel will simply allow that time to be put to better use [1]
2.1.2 A Brief History of Autonomous Vehicles
This section provides a history of the development of autonomous vehicles from the 1980s to the present
In the early 1980s, pioneer Ernst Dickmanns began developing what can be considered the first real robot cars He developed a vision system which used saccadic camera movements to focus attention on the most relevant visual input Probabilistic techniques such as extended Kalman filters were used to improve robustness in the presence of noise and uncertainty By 1987 his vehicle was capable
of driving at high speeds, albeit on empty streets [5]
In the late 80s Dickmanns participated in the European Prometheus project (PROgraMme for a European Traffic of Highest Efficiency and Unprecedented Safety) With an estimated investment of 1 billion dollars in today’s money, the Prometheus project laid the foundation for most subsequent work in the field By the mid-90s, the project produced vehicles capable of driving on highways at speeds of 80km/h in busy traffic [6] Techniques such as tracking other vehicles, convoy driving, and autonomous passing were developed
Another pioneer in the field was Dean Pomerleau who developed ALVINN (Autonomous Land Vehicle in a Neural Network) in the early 90s [7] ALVINN was notable for its ability to learn to drive on new road types with only a few minutes training from a human driver
After the successes of the 80s and early 90s, progress seems to have plateaued in the late 90s It was not until DARPA (Defence Advanced Research Projects Agency) launched the first of its Grand Challenges that interest in the field was renewed In
2004, DARPA offered a $1 million prize to the first autonomous vehicle capable of negotiating a 150 mile course through the Mojave Desert For the first time, the vehicles were required to be fully autonomous with no humans allowed in the vehicle during the competition By this time, GPS systems were widely available, significantly improving navigational abilities Despite several high profile teams, and
Trang 11Chapter 2 Background
general advances in computing technology, the competition proved to be a disappointment with the most successful team reaching only 7 miles before stopping The following year, DARPA re-held the competition This time the outcome was very different with five vehicles completing the 132 mile course and all but one of the 23 finalists surpassing the seven miles achieved the previous year The
competition was won by Stanley, the entry from the Stanford Racing Team headed
by Sebastian Thrun [12]
Buoyed by this success, DARPA announced a new challenge, to be held in 2007 named the Urban Challenge This would see the competition move from the desert to
an urban environment with the vehicles having to negotiate junctions and manoeuvre
in the presence of both autonomous and human-driven vehicles
Trang 12Chapter 2 Background
2.2 The Urban Challenge
This section describes the Urban Challenge and the main sub-challenges that were set
The competition took place in 2007 on a closed air force base in California All the autonomous vehicles and multiple human-driven vehicles were present on the course
at the same time The environment can, therefore, be considered a good approximation of a genuine urban environment even though the roads were not open
to the public
2.2.1 The Challenge
Each vehicle was required to complete a mission specified by an ordered series of checkpoints in a complex route network Each vehicle was expected to be able to negotiate all hazards including both static and dynamic obstacles, re-plan for alternative routes, and obey California traffic laws at all times [4]
More specifically, each vehicle had to demonstrate the following abilities:
Safe and correct check-and-go behaviour at junctions, when avoiding obstacles and when performing manoeuvres
Safe vehicle following at normal speeds and in slow moving queues
Safe road following, only changing lane when safe and legal to do so
GPS-free navigation (GPS may be used but is not reliable in urban environments)
Manoeuvres such as parking and u-turns
Each vehicle is supplied with two files, the Route Network Definition File (RNDF), and the Mission Definition File (MDF) The RNDF specifies the layout of the road
network and is common to all teams It specifies accessible road segments, lane widths, and stop sign locations The MDF contains the specific mission that the vehicle must accomplish, with each vehicle having a unique but equivalent mission
Trang 13Chapter 2 Background
2.3 Boss
Boss was developed by Carnegie Mellon University and finished in 1st place [8]
This section describes the route planning and intersection handling techniques used
by Boss The intersection handling described below would have been of relevance to
the project had time been available to add overtaking functionality
2.3.1 Route Planning
The RNDF is converted to a connected graph with directional edges representing
drivable lanes Each edge is assigned a weight that represents the cost of driving the
corresponding road segment The cost is calculated using the length and speed limit
of the segment as well as a term that represents the complexity or difficulty of the
terrain for Boss to negotiate Graph search techniques are then used to plan a path
from the current location to a goal location
As Boss navigates the chosen path, new information may become available that
requires the costs of road segments to be modified For example, Boss maintains a
map of obstacles it believes to be static If a static obstacle is determined to entirely
block the road, it is necessary to find an alternative route To do this, Boss
significantly increases the cost associated with the road segment and re-calculates a
new route to the goal The increased cost is sufficient to cause an alternative route to
be selected However, it is not desirable to permanently avoid the blocked road and
so the cost is exponentially reduced over time to its original value
Figure 2-1 Boss at the Urban Challenge
Trang 14Chapter 2 Background
2.3.2 Intersection Handling
A crucial requirement is that the vehicle is capable of negotiating intersections safely and observing correct precedence Precedence becomes important when the intersection contains more than one stop line (4-way stops are common in the US) The order of precedence is determined by the order in which the vehicles arrive at
their respective stop lines Boss estimates precedence by defining a precedence
polygon that starts around three metres prior to the stop line A vehicle is considered
to be in the polygon if its front bumper (or part of it) is within the polygon The time
at which the vehicle is detected as being in the polygon is used to estimate precedence As vehicles with higher precedence leave their polygons, Boss moves up the precedence order until it determines it has precedence
Whilst this approach seems straight forward, care must be taken when determining the size of the precedence polygon Increasing the size of the polygon improves the robustness of the algorithm but risks that two vehicles may be detected as one The idea of the precedence polygon is extended to apply to situations where Boss
must merge with moving traffic In this case, yield polygons are calculated based on
the time it would take Boss to execute the manoeuvre and the safe inter-vehicle gap For example, if Boss wishes to cross a lane of traffic coming from the left to join traffic coming from the right the following times would be considered:
The time to cross the lane, Taction
The time to accelerate to appropriate speed for the desired lane, Taccelerate
The minimum safe time gap between vehicles, Tspacing
These times are used to determine the size and location of the yield polygon for both the crossed lane and the destination lane Any vehicle within a polygon has its velocity tracked and the time at which it will cross Boss’s desired path is estimated Using the estimates for each lane, Boss is able to determine if there is sufficient time
to perform the manoeuvre
Trang 15Chapter 2 Background
2.4 Junior
Junior was developed by Stanford University and finished in 2nd place This section
describes Junior’s use of LIDAR and GPS/IMU See sections 5.3 and 5.5 for how
these sensors types are used by this project
Figure 2-2 Junior at the Urban Challenge and the Velodyne HDL64 LIDAR used by several teams
2.4.1 Localisation
As GPS signals are carried by microwaves, they are absorbed by water leading to reception problems in bad weather and under foliage Tall buildings reduce the visible area of the sky and, therefore, the choice of satellites, limiting accuracy Furthermore, GPS does not provide a means of directly determining the vehicle’s orientation For these reasons, the GPS unit is combined with an inertial measurement unit (IMU) which uses gyroscopes and accelerometers to estimate the velocity and acceleration of the vehicle [2]
Whilst combined GPS/IMU systems can provide sub-meter accuracy they are still
insufficient for safe road following They provide a pose estimate that is the most
probable at the current time and are prone to position jumps In addition, the data in
the RNDF is also GPS based and cannot be guaranteed to be accurate It is, therefore, necessary to use additional localisation techniques
Junior uses kerb locations and road markings and to accurately localise relative to the RNDF The kerb locations are described below Front and side mounted lasers that are angled down are used to measure the infra-red reflectivity of the road Lane markings can be extracted from this data and compared with lane data in the RNDF
Trang 16Chapter 2 Background
This fine-grained localisation is used to maintain an internal co-ordinate system that
is robust to position jumps
2.4.2 Obstacle Detection
Five of the six teams that completed the Urban Challenge used a high-definition LIDAR system as their primary sensor As the name suggests, LIDAR is similar to RADAR but pulses of laser light are used rather than radio waves Both Boss and Junior used a system manufactured by Velodyne Inc that was developed for the original Grand Challenge This roof-mounted system comprises a rotating unit containing 64 separate lasers Each of the lasers is fixed at a different pitch and therefore scans a different portion of the environment The result is a highly detailed 3-dimensional map of the environment that can be used to detect kerb-sized objects
at 100m [3]
The LIDAR produces a detailed map of the environment in the form of a point-cloud For obstacle detection, this data must be processed and features of interest extracted One method of doing this would be to identify points that are the same distance and direction from the vehicle but have different heights However, the Stanford team found that whilst such a method was suitable for detecting objects with large heights such as cars and pedestrians, it was not suitable for smaller objects such as kerbs The problem was setting a threshold that would allow kerbs to be detected without producing a large number of false-positives
To combat this, Junior uses a novel approach If the vehicle is on flat ground, each of the LIDAR lasers will scan a circle of known-radius around the vehicle The scans, therefore, generate a series of concentric circles with each circle a fixed distance apart On ground that is not flat, the distances between the circles are distorted much like the contours on a map By comparing the distances between the contours with the expected value, small objects can be detected with greater sensitivity than using vertical measurements
One complication with this approach is that of vehicle roll As the vehicle turns, it has a tendency to tilt outwards thus reducing the distance between the contours on one side of the vehicle and increasing them on the other If not compensated for, this
Trang 17as particles As more information becomes available, the particles are filtered allowing the object to be tracked over time
Trang 18Chapter 2 Background
2.5 Odin
Odin was developed by Virginia Tech as part of team VictorTango and finished in
3rd place [10] This section describes the path planning and architecture of Odin See
section 5.4 for details of how this project performs path planning
2.5.1 Path Planning
The RNDF contains a series of waypoints which define the road network The
distance between the waypoints may vary and it is, therefore, necessary to calculate a
smooth path from one point to the next To do this, Odin uses cubic splines The
same technique is used to generate paths through intersections and in unstructured
parking zones
Using splines guarantees a smooth path between points but does not guarantee that
the path accurately matches the road To combat this problem the curvature of the
splines is manually adjusted using the aerial photographs supplied by DARPA
2.5.2 Architecture
Odin implements a hybrid deliberative-reactive architecture [13] Such architectures
combine the benefits of high-level deliberative planning with low-level reactive
simplicity However, increases in computing power have allowed Odin to add a
further deliberative layer to handle low-level motion planning The reactive driving
behaviours are, therefore, sandwiched in a deliberative-reactive-deliberative
progression [11]
Figure 2-3 Odin at the Urban Challenge
Trang 19Chapter 2 Background
2.5.2.1 Route Planning
The top-level deliberative component is responsible for route planning It is invoked
on demand when a mission is first loaded or when an existing route is found to be blocked As with Boss, the road network is searched using A* graph search and aims
to find the route with the shortest time The time for a route is based on the speed limits and distances with additional fixed penalties for manoeuvres such as u-turns
2.5.2.2 Driving Behaviours
The reactive layer comprises a set of independent driving behaviours Each is dedicated to a specific driving task such as passing another vehicle or merging with moving traffic However, not all driving behaviours are applicable all of the time and, therefore, a sub-set is selected based on the current driving context For
example, on a normal section of road the route driver, passing driver, and the
blockage driver are applicable whereas at a junction the precedence, merge, and turn drivers are used The driving context therefore acts as an arbiter that activates
left-multiple behaviours
Route Driver Assumes no other traffic Passing Driver Pass other vehicles Blockage Driver React to blocked roads Precedence Driver Stop sign precedence Merge Driver Enters or crosses moving traffic Left Turn Driver Yields when turning left across traffic Zone Driver Re-route when stuck
The arbiter and each of the driving behaviours are implemented as finite state machines These are arranged in a hierarchy with the arbiter as the root The structure
of the hierarchy represents a top-down task decomposition rather than any idea of behavioural priority
As the arbiter is able to select multiple, potentially competing behaviours, an additional mechanism is required to select which commands the vehicle For this, a
form of command fusion is used which allows each behaviour to specify an urgency
Trang 20Chapter 2 Background
parameter This parameter indicates how strongly the behaviour feels that it should
be selected
2.5.2.3 Low-level Planning and Vehicle Control
The bottom, low-level deliberative layer is concerned with motion planning Its purpose is to determine a speed and trajectory that will keep Odin in the desired lane whilst avoiding obstacles or to perform manoeuvres such as parking
Once the desired path and speed are established, the vehicle needs to be commanded
appropriately To do this, the vehicle’s dynamics are modelled using a bicycle
model This simplifies modelling by compressing four wheels into two and has
proved to be sufficient for the low speeds experienced in the Urban Challenge [25] The base vehicle chosen for Odin was a hybrid-electric Ford Escape which has the
advantage of an existing built-in drive-by-wire system Sending the appropriate
commands to this system allow the steering, throttle, and gear change to be easily controlled An additional advantage of hybrid vehicles is that they have sophisticated power generation systems making it easy to power the computers and sensors
2.6 Discussion
The Urban Challenge has stimulated huge interest in the field of autonomous vehicles but how realistic a challenge did it represent? Having multiple autonomous vehicles interacting with each other and human driven vehicles on the scale seen in the Urban Challenge certainly presents a degree of realism not seen before However, there is a clear gap between the competition format and reality The challenge did not require vehicles to perceive road sighs or traffic lights Nor were vulnerable road users such as motorcycles or pedestrians encountered These are active areas of research but they were notable by their absence
The RNDF together with aerial photography presented the teams with a rich description of the environment Despite this, manual modifications were required to ensure correct operation It is unrealistic to expect this level of detail to be available
or maintained on a global basis
Trang 21Chapter 2 Background
The vehicles must perform well at many different tasks in order to succeed Some of
the problems can be considered solved whilst others show a trend towards a
particular solution For example, high-level route planning, an important aspect of
the challenge posed little problem with standard techniques such as A* being used
successfully Likewise, the low speeds encountered in urban driving pose little
problem in terms of vehicle control An autonomous vehicle developed by Stanford,
named Shelley, recently competed in an off-road hill climb race demonstrating the
state-of-the-art in vehicle dynamics
Perception is perhaps the area of most interest in the Urban Challenge There is a
clear trend towards direct sensing technology such as LIDAR and away from vision
This trend is likely to continue but many vision techniques will remain applicable to
images generated by laser sensors
There is no doubt that the availability of combined GPS and IMU technology has
been crucial to the field but despite these advances, localisation still proves to be a
serious problem In qualifying, Odin experienced a signal jump that caused it to
misjudge its position by 10m A similar, though less severe problem occurred in the
final event Another competitor, Knight Rider failed to complete the challenge due to
a localisation failure [14] Accurate localisation is crucial; even an error of 1m could
be catastrophic
2.7 Conclusion
The Urban Challenge and indeed, the preceding Grand Challenges have been a
powerful driving force in the development of autonomous vehicles Together they
mark a significant milestone towards the goal Progress has, in large part, been due to
Figure 2-4 Shelley, an autonomous Audi TT developed by Stanford
Trang 22Chapter 2 Background
advances in GPS and LIDAR technologies but limitations still remain Further improvements in these technologies are required Solving the technical challenges seems inevitable but other challenges such as questions of liability and how to adequately prove safety lie ahead
Trang 23Chapter 3 Simulation System
3.1 Architecture
The Open-source Racing Car Simulator (TORCS) is a racing simulator that has a reputation for having an accurate physics engine [23] It was developed with the artificial intelligence community in mind, being used as a platform for the development of computer-controlled opponents in racing games It has also been used as the base platform for the annual WCCI racing challenge [15]
TORCS is implemented in C++ and has an API [29] that provides physical vehicle parameters such a speed, acceleration, wheel rotations rates, and so on, that can be used as sensors (Table 3-1) This information is provided in real-time and is updated
at 50Hz The API also provides a means of commanding the vehicle via the variables shown in Table 3-2 Commanding the vehicle via this interface is analogous to the
use of vehicles with built-in drive-by-wire interfaces such as Odin in the Urban
Challenge
Figure 3-1 Top-level system architecture
The objective of this project is to implement the artificial intelligence vehicle controller (AIC) depicted in Figure 3-1 This controller must be capable of running in real-time Data is transferred between TORCS and the AIC using sockets allowing the AIC to run on a separate PC should performance be an issue
OUTPUT WINDOW
VEHICLE SENSOR DATA
CAPTURED IMAGE VIDEO OUTPUT
CONTROLLER VEHICLE COMMANDS
Trang 24Chapter 3 Simulation System
Rather than using a camera, input images must be captured directly from the simulator’s output window The Windows API provides a means of enumerating all windows currently in use From this it is possible to query the title of each window and, therefore, locate the TORCS window Once identified, a further Windows API call can be used to perform a fast copy of image data from that window into local process memory
Table 3-1 TORCS Data and Potential Application
Wheel rotation rates Odometry
Vehicle’s position in the world (x,y,z) Global positioning system (GPS)
Vehicle acceleration (x,y,z) Inertial measurement unit (IMU)
Track geometry Light detection and ranging (LIDAR)
Table 3-2 Vehicle Command Parameters
Steering angle Float / -1.0 … 1.0 -1.0 indicates full left-lock, 0.0 straight ahead, and
1.0 full right-lock Throttle Float / 0.0 … 1.0 0.0 indicates no throttle, 1.0 indicates full throttle Brake Float / 0.0 … 1.0 0.0 indicates no brake force, 1.0 indicates full
brake force Gear Integer / 0 … 5 0 indicates neutral, 1…5 indicates desired
Figure 3-2 shows the track selected It is 2.59km long and there are long gentle bends, sharp hair-pin style bends, and tight bends in opposite directions close together The figure also shows the lane markings which consist (for the most part)
Trang 25Chapter 3 Simulation System
of a continuous white border line on the left and right marking the edge of the road and a dashed white line separating the two driving lanes In lane detection systems a common problem is that of shadows and poor quality road markings The simulator image shown exhibits both these features to some extent
Figure 3-2 Selected track left, and example screenshot from camera
3.3 Car Selection
As TORCS is a racing simulator, it provides a selection of cars to choose from, the majority of which are dedicated track racing cars Most of the competitors in the DARPA Urban Challenge used SUV-style vehicles and the rules stated that the vehicle must be road-legal and of proven safety record [4] Of the cars provided by TORCS, the one that best matched these requirements was a Peugeot 406 This car was selected on the grounds that it is a typical saloon style car common on the roads and has good low speed handling characteristics due to being front-wheel drive The vehicle dimensions are Figure 3-3 shows an image of the selected vehicle
Figure 3-3 Peugeot 406
Trang 26Chapter 4 Reactive Prototype
It was necessary to perform a feasibility study to ensure that it was possible to capture the simulator output, process the data, and send control commands back to the simulator and to assess whether a typical laptop had sufficient processing power For the purposes of the prototype, it was important to select a technique that was direct, allowing the main parts of the system to be put together relatively quickly I chose to base the prototype on Dean Pomerleau’s ALVINN [7] This uses a neural network to directly convert an input image of the road into a steering angle for the vehicle Thus, the control of the vehicle is directly reactive to the current road scene The feed-forward network is organised as three layers comprising 800 input nodes (conceptually a image grid), 4 hidden nodes, and 31 output nodes The output , of each node is a function of its weighted inputs :
∑ The network weights are trained using the back-propagation algorithm [24] The general training process is described in more detail below Figure 4-1 illustrates the network’s structure
Figure 4-1 Neural network structure
At pixels, the input image is at a lower resolution than typically used for modern lane-tracking systems and certainly lower than the captured image
SHARP LEFT
SHARP RIGHT
1
… 800
INPUT IMAGE INPUT
LAYER
HIDDEN LAYER
OUTPUT LAYER
Trang 27Chapter 4 Reactive Approach
Therefore, the image must be down-sampled in a process described in section 4.1.1 prior to being passed to the neural network
Each of the 31 output nodes represents a specific steering angle with sharp-left corresponding to node 0, straight-ahead corresponding to node 15, and sharp-right node 30 Each output node returns a value between 0.0 and 1.0 indicating the degree
to which the network believes that to be the correct steering angle The output, therefore, represents a distribution of probable steering angles This distribution is then converted to a single floating point value by computing its centre of mass and rescaling to the range -1.0…1.0 for compatibility with the TORCS API
4.1.1 Image Processing
The captured image goes through the following steps to convert it into a format suitable for input to the neural network The effect of each step is illustrated in figure Figure 4-2
format The horizon lies approximately half-way down the image and, for the purposes of this project is assumed to be fixed The image is, therefore, cropped to , discarding the upper half
Intensity conversion: The cropped image is converted from RGB format to
grey-scale using the standard conversion formula:
Binarisation: A manually selected threshold is used to convert the image from grey
to binary The result is an image that has the road markings highlighted against the black background of the road Edge features to the sides of the road are also highlighted but this does not pose a problem
Trang 28Chapter 4 Reactive Approach
Down-sampling: At this point, the image resolution is slightly less than due to shrinkage during filtering This is still too high to use as input to the neural network The next step is, therefore, to reduce the resolution to by simply averaging the intensities over blocks of pixels
Figure 4-2 Image Processing Steps Top left, cropped RGB image Top right, grey scale image Middle left, smoothed image Middle right, edge enhancement Bottom left, binarisation Bottom right, resolution reduction
4.1.2 Training
In order for the neural network to operate, it must first be trained The training data required consists of a set of tuples, each containing an input image and the corresponding desired output steering angle ALVINN relied upon a human driver to train the network over a period of a few minutes driving on any new road type I chose to use a computer controlled ‘expert’ driver to train the network
This expert driver is used to determine the correct steering angle to be associated with a given input image as follows: The TORCS API makes it easy to determine the exact position of the vehicle relative to the centre-line of the road This information can be used to make the vehicle follow a given lane using a proportional steering control method Thus, the steering angle can be captured at the same time as the image, forming a training pair
Trang 29Chapter 4 Reactive Approach
The speed of the expert driver was fixed 15km/h, being slow enough that the most severe bends can be safely negotiated
It would be possible to generate a batch of training data by periodically capturing pairs as the expert driver navigates the track Once captured, this data could be partitioned into separate training and test sets However, I chose to train the network
in an online manner
As the vehicle travels around the track, an image is captured and the network generates what it believes to be the correct output steering angle This output is compared with the correct steering angle as determined by the expert driver If the two steering angles differ by more than a specified threshold, the expert driver wins the right to control the vehicle When this happens, the captured image and the correct steering angle are combined into a training pair and added to the current set
of training pairs Conversely, if the steering angles are in reasonable agreement then the AIC retains control Thus, training data is generated only in situations where it is required and constitutes a supervised learning approach
The threshold used to determine which driver is in control is initially set to a very small value so that any deviation between the two steering angles results in the expert driver gaining control The threshold is relaxed gradually over time This allows tight control at the outset but also allows more ‘wiggle room’ as the network becomes more competent allowing for slight deviations from the desired path to go unchecked
During each image processing cycle, 25ms is allocated to training the network incrementally using the all the training data accumulated so far The network, therefore, continually improves over time with training taking place whether the expert or the network is in control
When using the expert driver, we must convert from its exact steering angle to an
output distribution that is compatible with the desired network output To do this, a Gaussian distribution is created with its mean at the steering angle and a variance of 0.07 as illustrated in Figure 4-3
Trang 30Chapter 4 Reactive Approach
Figure 4-3 Illustration of the relationship between the input image and output steering
distribution (not to scale)
4.1.3 Results
This online supervised learning approach is highly effective Within only a few tens
of metres, the network is able to take control on the initial straight section As the first lap progresses, the network is able to maintain control through the more gentle bends The more severe bends and bends in opposite directions in quick succession are the last to be mastered by the network Often, by the end of the 2nd lap the network is fully trained and the 3rd lap is completed under full autonomous control Table 4-1 gives the percentage of autonomous control over three test runs of 4 laps each and Figure 4-4 shows how the capability of the network improves over the 3 laps
Figure 4-4: Expert versus autonomous control over 3 laps Red dots indicate areas where the expert driver was in control The final lap (right) is completely autonomous
The vehicle starts at the blue circle and travels in a clockwise direction The red markers indicate where the expert driver has control The left image shows the 1stlap, with the vehicle starting out under expert control The neural network quickly takes control and only needs occasional assistance during the first straight Heavy
CAPTURED IMAGE OUTPUT NODE VALUES
Trang 31Chapter 4 Reactive Approach
assistance is required during the bends on the first lap The middle image shows the
2nd lap – the expert driver is only required in four locations An interesting point is that during the 2nd lap, the network requires more assistance to straighten up when exiting a bend than it does on entry to the bend The right image shows the 3rd lap which is completed under full autonomous control
Table 4-1 Percentage of Autonomous Control
a standard Windows laptop with an AMD Turion64 processor – a five year old system at the time of writing The combined processor load of running TORCS and the AI controller was 100%
An important point about this system is the direct coupling between the frame rate and the steering command rate Each steering command only applies for the instant that it was generated If the image processing were interrupted for any reason, the vehicle will immediately lose control
4.1.4 Evaluation
The development of this system was, in itself, a substantial amount of work (approximately 6 weeks) but was necessary to demonstrate that TORCS could be integrated successfully with an independent vision and control system However, the basic architecture with regards to image capture, image processing, and inter-process communication would be re-usable Indeed, the rest of this project would not have been achievable in the available time had this prototype not been developed Despite the prototype being a success, it highlighted the limitations of the laptop used and, as
a result, a new high-performance laptop was used for the remainder of the project
Trang 32Chapter 5 Deliberative Approach
Whilst the neural network prototype successfully controls the vehicle, it operates in a reactive way; the steering angle is a direct function of the input image Furthermore, this function is essentially hidden and does not lend itself to analysis What features, for example, is the network responding too? In order to deal with more complex driving situations the entrants to the Urban Challenge required a higher level of scene understanding is required
The main body of this project is, therefore, concerned with controlling the vehicle in
a deliberative manner Starting with a captured image, the road markings are explicitly detected and modelled, prior knowledge of the road width is used to classify road markings, a trajectory for the vehicle is computed, and the vehicle controls both its speed and position to follow the desired path This project,
therefore, takes a sense, plan, act approach to vehicle control
5.1 Summary
This chapter forms the main body of the dissertation It starts with a description of how some ground truth data was generated for test purposes in Section 5.2 Section 5.3 covers the sensing aspects of the system It provides a short description of some initial investigations that, whilst useful, were not taken further, before describing the main image processing steps and LIDAR simulation Section 5.4 describes the techniques used to convert the perceived environment into a path for the vehicle to follow Finally, section 5.5 describes how the vehicle is controlled Figure 5-1 gives
an overview of the main steps in the system
Trang 33Chapter 5 Deliberative Approach
Figure 5-1: Main processing steps of the system Sensing steps are shown in blue, planning steps in red, and control steps in purple
5.2 Ground Truth Data
In order to perform experiments and evaluate different approaches, it was necessary
to generate a test set of image pairs comprising an original captured image and the corresponding ‘ground truth’ To do this, a set of 17 images was captured from various points along the track These images were chosen to be representative of the track and, therefore, included straight sections and bends of various degrees
Once the images were captured, they were converted to binary images using a manually selected threshold such that the lane markings were fully present – it is not desirable to lose any of the lane-marking information The resulting images contained a substantial amount of noise which was removed manually
CAPTURED IMAGE
GREY SCALE IMAGE
BINARY IMAGE
MATCHED FILTERS
MARKING VERIFICATION
SIMULATED
LIDAR DATA
DATA FUSION
MARKING DETECTION
FEATURE
DETECTION
TORCS
TRAJECTORY POINTS
PARABOLA FITTING
MARKING CLASSIFICATION
SPEED SELECTION
THROTTLE
CONTROL
STEERING CONTROL BRAKE
CONTROL
Trang 34Chapter 5 Deliberative Approach
The result is a set of image pairs; the original captured image along with a binary image containing only the lane markings Figure 5-2 shows an example of such a ground truth pair
Figure 5-2 Example of a captured image and the corresponding ground truth
5.3 Sensing
The section describes the development of the sensing system and culminates with the fusing of vision and LIDAR data into a virtual lane marking sensor
5.3.1 Some Initial Experiments
At the project outset, I had no specific technique in mind for detecting the markings Therefore, I performed some experiments to explore different options
road-5.3.1.1 Simple Thresholding
It is important to try the simple approaches before looking for more sophisticated techniques Although I did not expect simple thresholding to be a reliable means of distinguishing the lane markings from the background, I decided to start with this approach A side-effect of this is that it provides a baseline for evaluating other methods
Using the ground-truth test set, the original image is converted to a grey-scale image This is then thresholded and the resulting binary image compared with the ground-truth By doing this, it is possible to obtain a measure of the signal to noise ratio of the binary image The signal to noise ratio is defined as:
∑
∑
Trang 35Chapter 5 Deliberative Approach
Thus, for each pixel set in the binary image we determine if it corresponds to a genuine road marking in the ground-truth or whether it is a false positive (noise) By repeating the process, the threshold with the highest SNR can be determined
Figure 5-3 Effect of different thresholds on SNR Top left, captured image Right, SNR against
threshold Bottom left, binary image with highest SNR
Figure 5-3 shows the SNR obtained using different thresholds for a single image There is a clear peak at a threshold of 150 (for this particular image) Comparing against the ground truth image in Figure 5-2, we can see that although the signal to noise ratio has been maximised, there are significant sections of the markings missing This is, in part, caused by the shadows cast over the road This problem is typical in road marking detection systems and vision systems in general
It is clear that using a simple threshold is not appropriate given that substantial portions of the lane markings are absent even when we are in a position to choose the best threshold
The experiment was repeated using each pair in the test set The peak SNR value occurred at an average threshold of 136 This threshold was used to generate the SNR values shown in Figure 5-17
5.3.1.2 Inverse Perspective Mapping
A common approach to lane marking detection is to perform an inverse perspective mapping (IPM) to remove the foreshortening effect due to perspective [16][17] The result of IPM is an image of the road as though viewed directly from above The technique works by projecting the perspective image onto the ground-plane which is
Trang 36Chapter 5 Deliberative Approach
assumed to be both flat and horizontal A description of the technique can be found
in [16] In order to apply IPM, characteristics of the camera such as height, and field
of view must be known Typically, this information is obtained using a automated calibration process which involves placing a chessboard pattern of known dimensions in front of the camera Many vision software libraries include routines to facilitate this process However, as this project uses a simulated camera, the calibration approach is not applicable Instead I obtained an approximation of the camera characteristics by working through the simulator source code It would have been possible to obtain precise information as this is necessarily encoded within the simulator but I was reluctant to spend too much time on this in the initial stages of the project My initial evaluation of this approach involved applying IPM to the ground-truth images and simply evaluating the results by eye
semi-Figure 5-4: Effect of applying IPM to ground truth images
Examples of applying IPM to ground-truth images are shown in Figure 5-4 The benefits of IPM are clear, with the edges of the road now appearing parallel Using this approach would facilitate applying constraints when searching for the lane markings – particularly searching for sets of parallel lines rather than individual markings However, despite the clear visual benefits of IPM, it is not without its problems As IPM relies on the flat ground-plane assumption, the image can become distorted when the assumption does not hold Furthermore, pixels in the perspective image that are distant from the camera are mapped to multiple pixels in the IPM image This produces a block-like effect that becomes more apparent the further the
Trang 37Chapter 5 Deliberative Approach
pixel is from the camera [16] Figure 5-5 shows the effect of applying IPM to more severe bends where the markings in the perspective image are thin
Figure 5-5: Left image shows non-parallel lines Right image illustrates block effect for distant pixels
In this figure, both of the IPM images show distortion of the road geometry as the distance from the camera increases In particular, the lane markings cease to be parallel and the block effect of mapping a single pixel in the perspective image to multiple pixels in the IPM image can be seen (despite anti-aliasing being used)
5.3.1.3 RANSAC Curve Fitting
TORCS represents bends as circular arc segments although more complex curves can
be made by joining segments of different radii together As this project is concerned with modelling road geometry, I experimented with fitting circular arcs to the IPM images
Figure 5-6: Result of using RANSAC to fit circles to the IPM image
Trang 38Chapter 5 Deliberative Approach
Figure 5-6 shows the result at fitting circular arcs to an IPM image using the RANSAC algorithm [21] The thickness of the lane markings causes many circles to pass the acceptance test This suggests that the approach is unlikely to be reliable at determining the radius of a given bend There were two further problems with this approach Firstly, no circles were matched to the centre-line and secondly, small modifications to the RANSAC parameters seemed to make the difference between many circles being detected and none being detected Given these problems, I decided that this approach was unlikely to succeed and did not investigate it further However, with hindsight there are several things that could have improved the situation For example, thinning the markings prior to the application of RANSAC and using a parabolic model rather than the somewhat restrictive circular approach Nonetheless, the time spent understanding these techniques would prove useful elsewhere in the project; both inverse perspective mapping and curve fitting are used
in section 5.3.3 on road geometry modelling
5.3.2 The MIT Approach
Whilst many lane marking detection systems employ the inverse perspective mapping approach as the first processing step, not all do so In particular, the approach taken by the MIT team and described in Albert Huang’s PhD thesis [19] works around the foreshortening effect by applying filters of different sizes directly
to the perspective image
This approach was of particular interest to me for two reasons Firstly, their technique proved to be successful in the competitive environment of the Urban Challenge and secondly, they describe a way in which data from LIDAR sensors can
be fused with camera data
5.3.2.1 Image Capture & Pre-processing
In contrast to the neural network approach, where the input image is used to directly determine the steering angle, the approach taken here involves separating the tasks of scene understanding and vehicle control More specifically, the lane markings are extracted and used to form a model of the road geometry and subsequently plan a path for the vehicle to follow As we are concerned with detecting specific features and their location in the distance, it makes good sense to increase the resolution of
Trang 39Chapter 5 Deliberative Approach
the input image However, as resolution increases so does the cost of processing the data I chose to set the simulator output to a resolution of The image is captured and cropped to again assuming that the horizon is fixed halfway down
The image is then converted from RGB to grey-scale in the normal manner In addition to this, a separate binary image is created which is used for a verification step described in section 5.3.4 As this binary image is not used directly for feature detection the choice of threshold need not be too fine-tuned and is selected to provide
a reasonable separation of the lane-markings from the road surface
5.3.2.2 Matched Filters
Huang observes that as lane markings are typically of a standard width and the rate
of foreshortening in a perspective image can be determined [19], it is possible to locate the lane markings by searching for features of a size dependent on the distance from the camera As with inverse perspective mapping, this relies on the flat ground-plane assumption The technique, therefore, searches for features of a size that is a function of the marking width and the scanline being searched
However, this makes the further assumption that a single horizontal scanline represents a line in the world that is a constant distance from the camera This is not
the case and it is possible that extending the function to include the position within
the scanline may improve the algorithm However, this was not investigated
In order to detect a feature of a specified size the filter shown in Figure 5-7 is scaled such that the portion above zero is the same length (in pixels) as the feature to be detected
Figure 5-7 Feature detection filter template The filter is scaled to match the desired feature size
Trang 40Chapter 5 Deliberative Approach
Figure 5-8 illustrates the principle behind matched filters A single scanline containing two different sized features is convolved with a filter whose size matches one feature but not the other When the filter exactly matches the feature size, the result is a clear local maximum When the match is not exact, the result is a truncated peak We can, therefore, locate features by searching for definite local maxima
Figure 5-8 Principle behind matched filters Left, single scanline with two different sized features Right, the result of filtering The filter does not match the first feature but matches the second exactly