14 Novel Framework of Robot Force Control Using Reinforcement Learning Byungchan Kim1 and Shinsuk Park2 1Center for Cognitive Robotics Research, Korea Institute of Science and Technolo
Trang 114
Novel Framework of Robot Force Control Using
Reinforcement Learning
Byungchan Kim1 and Shinsuk Park2
1Center for Cognitive Robotics Research, Korea Institute of Science and Technology
2Department of Mechanical Engineering, Korea University
Korea
1 Introduction
Over the past decades, robotic technologies have advanced remarkably and have been proven to be successful, especially in the field of manufacturing In manufacturing, conventional position-controlled robots perform simple repeated tasks in static environments In recent years, there are increasing needs for robot systems in many areas that involve physical contacts with human-populated environments Conventional robotic systems, however, have been ineffective in contact tasks.Contrary to robots, humans cope with the problems with dynamic environments by the aid of excellent adaptation and learning ability In this sense, robot force control strategy inspired by human motor control would be a promising approach
There have been several studies on biologically-inspired motor learning Cohen et al suggested impedance learning strategy in a contact task by using associative search network (Cohen et al., 1991) They applied this approach to wall-following task Another study on motor learning investigated a motor learning method for a musculoskeletal arm model in free space motion using reinforcement learning (Izawa et al., 2002) These studies, however, are limited to rather simple problems In other studies, artificial neural network models were used for impedance learning in contact tasks (Jung et al., 2001; Tsuji et al., 2004) One
of the noticeable works by Tsuji et al suggested on-line virtual impedance learning method
by exploiting visual information Despite of its usefulness, however, neural network-based learning involves heavy computational load and may lead to local optimum solutions easily The purpose of this study is to present a novel framework of force control for robotic contact tasks To develop appropriate motor skills for various contact tasks, this study employs the following methodologies First, our robot control strategy employs impedance control based on a human motor control theory - the equilibrium point control model The
equilibrium point control model suggests that the central nervous system utilizes the
spring-like property of the neuromuscular system in coordinating multi-DOF human limb movements (Flash, 1987) Under the equilibrium point control scheme, force can be controlled separately by a series of equilibrium points and modulated stiffness (or more generally impedance) at the joints, so the control scheme can become simplified considerably Second, as the learning framework, reinforcement learning (RL) is employed
to optimize the performance of contact task RL can handle an optimization problem in an
Trang 2unknown environment by making sequential decision policies that maximize external
reward (Sutton et al., 1998) While RL is widely used in machine learning, it is not
computationally efficient since it is basically a Monte-Carlo-based estimation method with
heavy calculation burden and large variance of samples For enhancing the learning
performance, two approaches are usually employed to determine policy gradient One
approach provides the baseline for gradient estimator for reducing variance (Peters et al.,
2006), and the other suggests Bayesian update rule for estimating gradient (Engel et al.,
2003) This study employs the former approach for constructing the RL algorithm
In this work, episodic natural actor-critic algorithm based on the RLS filter was
implemented for RL algorithm Episodic Natural Actor-Critic method proposed by Peters et
al is known effective in high-dimensional continuous state/action system problems and can
provide optimum closest solution (Peters et al., 2005) A RLS filter is used with the Natural
Actor-Critic algorithm to further reduce computational burden as in the work of Park et al
(Park et al., 2005) Finally, different task goals or performance indices are selected
depending on the characteristics of each task In this work, the performance indices for two
contact tasks were chosen to be optimized: point-to-point movement in an unknown force
field, and catching a flying ball The performance of the tasks was tested through dynamic
simulations
This paper is organized as follows Section 2 introduces the equilibrium point control model
based impedance control methods In section 3, we describe the details of motor skill
learning based on reinforcement learning Finally, simulation results and discussion of the
results are presented
2 Impedance control based on equilibrium point control model
Mechanical impedance of a robot arm plays an important role in the dynamic interaction
between the robot arm and its environment in contact Impedance control is a widely-adopted
control method to execute robotic contact tasks by regulating its mechanical impedance which
characterizes the dynamic behavior of the robot at the port of interaction with its environment
The impedance control law may be described as follows (Asada et al., 1986):
Where τactuator represents the joint torque exerted by the actuators, and the current and
desired positions of the end-effector are denoted by vectors x and xd, respectively Matrices
KC and BC are stiffness and damping matrices in Cartesian space This form of impedance
control is analogous to the equilibrium point control, which suggests that the resulting
torque by the muscles is given by the deviations of the instantaneous hand position from its
corresponding equilibrium position The equilibrium point control model proposes that the
muscles and neural control circuits have “spring-like” properties, and the central nervous
system may generate a series of equilibrium points for a limb, and the “spring-like”
properties of the neuromuscular system will tend to drive the motion along a trajectory that
follows these intermediate equilibrium postures (Park et al., 2004; Hogan, 1985) Fig 1
illustrates the concept of the equilibrium point control Impedance control is an extension of
the equilibrium point control in the context of robotics, where robotic control is achieved by
imposing the end-effector dynamic behavior described by mechanical impedance
Trang 3Novel Framework of Robot Force Control Using Reinforcement Learning 261
Figure 1 Conceptual model of equilibrium point control hypothesis
For impedance control of a two-link manipulator, stiffness matrix KC is formed as follows:
In equation (3), orthogonal matrix V is composed of the eigenvectors of stiffness matrix KC,
and the diagonal elements of diagonal matrix Σ consists of the eigenvalues of stiffness
matrix KC Stiffness matrix KC can be graphically represented by the stiffness ellipse (Lipkin et
al., 1992) As shown in Fig 2, the eigenvectors and eigenvalues of stiffness matrix
correspond to the directions and lengths of principal axes of the stiffness ellipse,
respectively The characteristics of stiffness matrix KC are determined by three parameters of
its corresponding stiffness ellipse: the magnitude (the area of ellipse: 2λ1λ2), shape (the
length ratio of major and minor axes: λ1/λ2), and orientation (the directions of principal
axes: θ) By regulating the three parameters, all the elements of stiffness matrix K C can be
determined
In this study, the stiffness matrix in Cartesian space is assumed to be symmetric and positive
definite This provides a sufficient condition for static stability of the manipulator when it
interacts with a passive environment (Kazerooni et al., 1986) It is also assumed that
damping matrix BC is approximately proportional to stiffness matrix K C The ratio BC /K C is
chosen to be a constant of 0.05 as in the work of Won for human arm movement (Won,
1993)
Trang 4Figure 2 A graphical representation of the end-effector’s stiffness in Cartesian space The
lengths λ 1 and λ 2 of principal axes and relative angle θ represent the magnitude and the
orientation of the end-effectors stiffness, respectively
For trajectory planning, it is assumed that the trajectory of equilibrium point for the
end-effector, which is also called the virtual trajectory, has a minimum- jerk velocity profile for
smooth movement of the robot arm (Flash et al., 1985) The virtual trajectory is calculated
from the start point x i to the final point x f as follows:
, where t is a current time and t f is the duration of movement
3 Motor Skill Learning Strategy
A two-link robotic manipulator for two-dimensional contact tasks was modeled as shown in
Fig 3 The robotic manipulator is controlled using the impedance control method based on
the equilibrium point control hypothesis as described in Section 2 The stiffness and
damping of the manipulator are modulated during a contact task, while the trajectory of
equilibrium point is given for the task The manipulator learns the impedance modulation
strategy for a specific task through reinforcement learning The state vector is composed of
the joint angles and velocities at the shoulder and elbow joints The action vector changes
the three parameters of stiffness ellipse: the magnitude (the area of ellipse), shape (the
length ratio of major and minor axes), and orientation (the direction of principal axes) This
section describes the learning method based on reinforcement learning for controlling task
impedance of the two-link manipulator in performing two-dimensional contact tasks
Trang 5Novel Framework of Robot Force Control Using Reinforcement Learning 263
Figure 3 Two-link robotic manipulator (L1 and L2: Length of the link, M1 and M2: Mass of
the link, I1 and I2: Inertia of the link)
3.1 Reinforcement Learning
The main components of RL are the decision maker, the agent, and the interaction with the
external environment In the interaction process, the agent selects action at and receives
environmental state st and scalar-valued reward r t as a result of the action at discrete time t
The reward r t is a function that indicates the action performance The agent strives to
maximize reward r t by modulating policy π(s t ,a t) that chooses the action for a given state st
The RL algorithm aims to maximize the total sum of future rewards or the expected return,
rather than the present reward A discounted sum of rewards during one episode (the
sequence of steps to achieve the goal from the start state) is widely used as the expected
return:
1 0
1 0
( )
T k
i T k
Here, γ is the discounting factor (0≤γ≤1), and V π(s) is the value function that represents an
expected sum of rewards The update rule of value function V π(s) is given as follows:
In equation (6), the term rt+ ⋅ γ Vπ( ) st+1 − Vπ( ) st is called the Temporal Difference (TD) error
The TD error indicates whether action at at state st is good or not This updated rule is
repeated to decrease the TD error so that value function V π(st) converges to the maximum
point
Trang 63.2 RLS-based episodic Natural Actor-Critic algorithm
For many robotic problems, RL schemes are required to deal with continuous state/action
space since the learning methods based on discrete space are not applicable to high
dimensional systems High-dimensional continuous state/action system problems,
however, are more complicated to solve than discrete state/action space problems While
Natural Actor-Critic (NAC) algorithm is known to be an effective approach for solving
continuous state/action system problems, this algorithm requires high computational
burden in calculating inverse matrices To relieve the computational burden, Park et al
suggested modified NAC algorithm combined with RLS filter The RLS filter is used for
adaptive signal filtering due to its fast convergence speed and low computational burden
while the possibility of divergence is known to be rather low (Xu et al., 2002) Since this filter
is designed for infinitely repeated task with no final state (non-episodic task), this approach is
unable to deal with episodic tasks
This work develops a novel NAC algorithm combined with RLS filter for episodic tasks We
named this algorithm the “RLS-based eNAC (episodic Natural Actor-Critic) algorithm.” The
RLS-based eNAC algorithm has two separate memory structures: the actor structure and the
critic structures The actor structure determines policies that select actions at each state, and
the critic structure criticizes the selected action of the actor structure whether the action is
good or not
In the actor structure, the policy at state s t in episode e is parameterized as
π(a t|st )=p(a t|st ,ψ e), and policy parameter vector ψe is iteratively updated after finishing one
episode by the following update rule:
represents the gradient of objective functionJ ψ ( e) Peters et al derived the gradient of the
objective function based on the natural gradient method originally proposed by Amari
(Amari, 1998) They suggested a simpler update rule by introducing the natural gradient
vector we as follows:
e+ ← e+ ∇ α ψ J e ≈ e+ α e
, where α denotes the learning rate (0≤α≤1)
In the critic structure, the least-squares (LS) TD-Q(1) algorithm is used for minimizing the
TD error which represents the deviation between the expected return and the current
prediction value (Boyan, 1999) By considering the Bellman equation in deriving LSTD-Q(1)
algorithm, the action-value function Q π(s, a) can be formulated as follows (Sutton et al.,
γ γ
Trang 7Novel Framework of Robot Force Control Using Reinforcement Learning 265
where V(s t+1 ) approximates value function V π(st+1 ) as iterative learning is repeated Peters et
function can be approximated using the compatible function approximation
( , ) elog ( )T
Aπ s a = ∇ψ π a s w (Peters et al., 2003) With this assumption, we can rearrange
the discounted summation of (10) for one episode trajectory with N states like below:
1
1 1
where the term γ N+1 V(s N+1 ) is negligible for its small effect By letting V(s0) the product of
1-by-1 critic parameter vector v and 1-1-by-1 feature vector [1], we can formulate the following
r v
Here, natural gradient vector we is used for updating policy parameter vector θe (equation
(8)), and value function parameter v e is used for approximating value function V π(st) By
t t t
G φ φ and be= φ er Matrix Ge for episode e is updated to yield the solution
vector χe using the following update rule:
, where δ is a positive scalar constant, and I is an identity matrix By adding the term δI in
(14), matrix Ge becomes diagonally-dominant and non-singular, and thus invertible (Strang,
1988) As the update progresses, the effect of perturbation is diminished Equation (14) is the
conventional form of critic update rule of eNAC algorithm
The main difference between the conventional eNAC algorithm and the RLS-based eNAC
algorithm is the way matrix Ge and solution vector χe are updated The RLS-based eNAC
algorithm employs the update rule of RLS filter The key feature of RLS algorithm is to
exploit Ge-1-1, which already exists, for the estimation of Ge-1 Rather than calculating Pe
Trang 8(inverse matrix of Ge) using the conventional method for matrix inversion, we used the RLS
filter-based update rule as follows: (Moon et al., 2000):
1
1 1
Where forgetting factor β (0≤β≤1) is used to accumulate the past information in a discount
manner By using the RLS-filter based update rule, the inverse matrix can be calculated
without too much computational burden This is repeated until the solution vector χe
converges It should be noted, however, that matrix Ge should be obtained using (14) for the
first episode (episode 1)
The entire procedure of the RLS-based episodic Natural Actor-Critic algorithm is
for each step,
Take action a t , from stochastic policy π,
then, observe next state st+1 , reward r t
Update Actor structure:
Update policy parameter vector following the rule in (8)
repeat until converge
Table 1 RLS-based episodic Natural Actor-Critic algorithm
3.3 Stochastic Action Selection
As discussed in section 2, the characteristics of stiffness ellipse can be changed by
modulating three parameters: the magnitude, shape, and orientation For performing
contact tasks, policy π is designed to plan the change rate of the magnitude, shape, and
orientation of stiffness ellipse at each state by taking actions ( [ , , ]T
t= a mag a shape a orient
Through the sequence of an episode, policy π determines those actions The goal of the
Trang 9Novel Framework of Robot Force Control Using Reinforcement Learning 267
learning algorithm is to find the trajectory of the optimal stiffness ellipse during the episode
Fig 4 illustrates the change of stiffness ellipse corresponding to each action
Policy π is in the form of Gaussian density function In the critic structure, compatible
t t
π
∇ψ a s w is derived from the stochastic policy π based on the
algorithm suggested by Williams (Williams, 1992) Policy π for each component of action
vector at can be described as follows:
2 2
1
22
Since the stochastic action selection is dependent on the conditions of Gaussian density
function, the action policy can be designed by controlling the mean μ and the variance σ of
equation (16) These variables are defined as:
10.001
T t
Since the stochastic action selection is dependent on the conditions of Gaussian density
function, the action policy can be designed by controlling the mean μ and the variance σ of
equation (16) These variables are defined as:
1 0.001
T t
As can be seen in the equation, mean μ is a linear combination of the vector ω and state
vector st with mean scaling factorξ Variance σ is in the form of the sigmoid function with
the positive scalar parameter η and variance scaling factorξ As a two-link manipulator
model is used in this study, the components of state vector st include the joint angles and
velocities of shoulder and elbow joints (st=[x x x x1, , , , 2 1 2 1]T, where the fifth component of 1 is
a bias factor) Therefore, natural gradient vector w (and thus policy parameter vector ψ) is
composed of 18 components as follows:
=
, where, ωmag, ωshape, and ωorient are 5-by-1 vectors and η mag , η shape , and η orient are parameters
corresponding to the three components of action vector ( [ , , ]T
t= a mag a shape a orient
4 Contact Task Applications
In this section, the method developed in the previous section is applied to two contact tasks:
point-to-point movement in an unknown force field, and catching a flying ball The two-link
manipulator developed in Section 3 was used to perform the tasks in two-dimensional
Trang 10space The dynamic simulator is constructed by MSC.ADAMS2005, and the control algorithm is implemented using Matlab/Simulink (Mathworks, Inc.)
Figure 4 Some variation examples of stiffness ellipse (a) The magnitude (area of ellipse) is changed (b) The shape (length ratio of major and minor axes) is changed (c) The orientation
(direction of major axis) is changed (solid line: stiffness ellipse at time t, dashed line:
stiffness ellipse at time t+1)
Trang 11Novel Framework of Robot Force Control Using Reinforcement Learning 269
4.1 Point-to-Point Movement in an Unknown Force Field
In this task, the two-link robotic manipulator makes a point-to-point movement in an
unknown force field The velocity-dependent force is given as:
viscous
x y
The movement of the endpoint of the manipulator is impeded by the force proportional to
its velocity, as it moves to the goal position in the force field This condition is identical to
that of biomechanics study of Kang et al (Kang et al., 2007) In their study, the subject
moves ones hand from a point to another point in a velocity-dependent force field, which is
same as equation (18), where the force field was applied by an special apparatus (KINARM,
BKIN Technology)
For this task, the performance indices are chosen as follows:
1 The root mean square of the difference between the desired position (virtual trajectory)
and the actual position of the endpoint (Δrms)
2 The magnitude of the time rate of torque vector of two arm joints ( 2 2
τ τ
τ ) The torque rate represents power consumption, which can also be interpreted as metabolic
costs for human arm movement (Franklin et al., 2004; Uno et al., 1989) By combining the
two performance indices, two different rewards are formulated for one episode as follows:
, where w1 and w2 are the weighting factors, and κ1 and κ2 are constants The reward is a
weighted linear combination of time integrals of two performance indices
The learning parameters were chosen as follows: α = 0.05, β = 0.99, γ = 0.99 The change
limits for action are set as [-10, 10] degrees for the orientation, [-2, 2] for the major/minor
ratio, and [-200π, 200π] for the area of stiffness ellipse The initial ellipse before learning was
set to be circular with the area of 2500π
The same physical properties as in (Kang et al., 2007) were chosen for dynamic simulations
(Table 2)
Table 2 Physical properties of two link arm model
Fig 5 shows the change of stiffness ellipse trajectory before and after learning Before
learning, the endpoint of the manipulator was not even able to reach the goal position (Fig 5
(a)) Figs 5 (b) and (c) compare the stiffness ellipse trajectories after learning using two
different rewards (reward1 and reward2) As can be seen in the figures, for both rewards the
major axis of stiffness ellipse was directed to the goal position to overcome resistance of
viscous force field
Trang 12Fig 6 compares the effects of two rewards on the changes of two performance indices (Δrms
and τ ) as learning iterates While the choice of reward does not affect the time integral of
rms
Δ , the time integral of τ was suppressed considerably by using reward2 in learning
The results of dynamic simulations are comparable with the biomechanics study of Kang et al The results of their study suggest that the human actively modulates the major axis toward the direction of the external force against the motion, which is in accordance with our results
Figure 5 Stiffness ellipse trajectories (dotted line: virtual trajectory, solid line: actual
trajectory) (a) Before learning (b) After learning (reward1) (c) After learning (reward2)
Figure 6 Learning effects of performance indices (average of 10 learning trial) (a) position error (b) torque rate
4.2 Catching a Flying Ball
In this task, the two-link robotic arm catches a flying ball illustrated in Fig 7 The simulation was performed using the physical properties of the arm as listed in Table 2 The main issues
Trang 13Novel Framework of Robot Force Control Using Reinforcement Learning 271
in ball-catching task would be how to detect the ball trajectory and how to reduce the impulsive force between the ball and the end-effector This work focuses on the latter and assumes that the ball trajectory is known in advance
Figure 7 Catching a flying ball
When a human catches a ball, one moves ones arm backward to reduce the impulsive contact force By considering the human ball-catching, the task is modeled as follow: A ball
is thrown to the end-effector of the robot arm The time for the ball to reach the end-effector
is approximately 0.8sec After the ball is thrown, the arm starts to move following the parabolic orbit of the flying ball While the end-effector is moving, the ball is caught and then moves to the goal position together The robot is set to catch the ball when the end-effector’s moving at its highest speed to reduce the impulsive contact force between the ball and the end-effector The impulsive force can also be reduced by modulating the stiffness ellipse during the contact
The learning parameters were chosen as follows: α = 0.05, β = 0.99, γ = 0.99 The change
limits for action are set as [-10, 10] degrees for the orientation, [-2, 2] for the major/minor
ratio, and [-200π, 200π] for the area of stiffness ellipse The initial ellipse before learning was set to be circular with the area of 10000 π
For this task, the contact force is chosen as the performance index:
where κ is a constant Fig 8 illustrates the change of stiffness during contact after learning
As can be seen in the figure, the stiffness is tuned soft in the direction of ball trajectory, while the stiffness normal to the trajectory is much higher Fig 9 shows the change of the impulse as learning continues As can be seen in the figure, the impulse was reduced considerably after learning
Trang 14Figure 8 Catching a flying ball
Figure 9 Catching a flying ball
5 Conclusions
Safety in robotic contact tasks has become one of the important issues as robots spread their applications to dynamic, human-populated environments The determination of impedance
Trang 15Novel Framework of Robot Force Control Using Reinforcement Learning 273
control parameters for a specific contact task would be the key feature in enhancing the robot performance This study proposes a novel motor learning framework for determining impedance parameters required for various contact tasks As a learning framework, we employed reinforcement learning to optimize the performance of contact task We have demonstrated that the proposed framework enhances contact tasks, such as door-opening, point-to-point movement, and ball-catching
In our future works we will extend our method to apply it to teach a service robot that is required to perform more realistic tasks in three-dimensional space Also, we are currently investigating a learning method to develop motor schemata that combine the internal models of contact tasks with the actor-critic algorithm developed in this study
Boyan J (1999) Least-squares temporal difference learning, Proceeding of the 16th
International Conference on Machine Learning, pp 49-56
Cohen, M & Flash, T (1991) Learning impedance parameters for robot control using
associative search network, IEEE Transactions on Robotics and Automation, Vol 7,
Issue 3, pp 382-390, ISSN 1042-296X
Engel, Y.; Mannor, S & Meir, R (2003) Bayes meets bellman: the gaussian process approach
to temporal difference learning, Proceeding of the 20th International Conference on Machine Learning, pp 154-161
Flash, T & Hogan, N (1985) The coordination of arm movements: an experimentally
confirmed mathematical model, Journal of Neuroscience, Vol 5, No 7, pp 1688-1703,
ISSN 1529-2401
Flash, T (1987) The control of hand equilibrium trajectories in multi-joint arm movement,
Biological Cybernetics, Vol 57, No 4-5, pp 257-274, ISSN 1432-0770
Franklin, D W.; So, U.; Kawato, M & Milner, T E (2004) Impedance control balances
stability with metabolically costly muscle activation, Journal of Neurophysiology, Vol
92, pp 3097-3105, ISSN 0022-3077
Hogan, N (1985) Impedance control: An approach to manipulation: part I theory, part II
implementation, part III application, ASME Journal of Dynamic System, Measurement, and Control, Vol 107, No 1, pp 1-24, ISSN 0022-0434
Izawa, J.; Kondo, T & Ito, K (2002) Biological robot arm motion through reinforcement
learning, Proceedings of the IEEE International Conference on Robotics and Automation,
Vol 4, pp 3398-3403, ISBN 0-7803-7272-7
Jung, S.; Yim, S B & Hsia, T C (2001) Experimental studies of neural network impedance
force control of robot manipulator, Proceedings of the IEEE International Conference on Robotics and Automation, Vol 4, pp 3453-3458, ISBN 0-7803-6576-3
Kang, B.; Kim, B.; Park, S & Kim, H (2007) Modeling of artificial neural network for the
prediction of the multi-joint stiffness in dynamic condition, Proceeding of IEEE/RSJ International Conference on Intelligent Robots and Systems, pp 1840-1845, ISBN 978-1-
4244-0912-9, San Diego, CA, USA
Trang 16Kazerooni, H.; Houpt, P K & Sheridan, T B (1986) The fundamental concepts of robust
compliant motion for robot manipulators, Proceedings of the IEEE International Conference on Robotics and Automation, Vol 3, pp 418-427, MN, USA
Lipkin, H & Patterson, T (1992) Generalized center of compliance and stiffness, Proceedings
of the IEEE International Conference on Robotics and Automation, Vol 2, pp 1251-1256,
ISBN 0-8186-2720-4, Nice, France
Moon, T K & Stirling, W C (2000) Mathematical Methods and Algorithm for Signal Processing,
Prentice Hall, Upper Saddle River, NJ, ISBN 0-201-36186-8
Park, J.; Kim, J & Kang, D (2005) An rls-based natural actor-critic algorithm for locomotion
of a two-linked robot arm, Proceeding of International Conference on Computational Intelligence and Security, Part I, LNAI, Vol 3801, pp 65-72, ISSN 1611-3349
Park, S & Sheridan, T B (2004) Enhanced human-machine interface in braking, IEEE
Transactions on Systems, Man, and Cybernetics, - Part A: Systems and Humans, Vol 34,
No 5, pp 615-629, ISSN 1083-4427
Peters, J.; Vijayakumar, S & Schaal, S (2005) Natural actor-critic, Proceeding of the 16th
European Conference on Machine Learning, LNCS, Vol 3720, pp.280-291, ISSN
1611-3349
Peters J & Schaal, S (2006) Policy gradient methods for robotics, Proceeding of IEEE/RSJ
International Conference on Intelligent Robots and Systems, pp 2219-2225, ISBN
1-4244-0259-X, Beijing, China
Strang, G (1988) Linear Algebra and Its Applications, Harcourt Brace & Company
Sutton, R S & Barto, A G (1998) Reinforcement Learning: An Introduction, MIT Press, ISBN
0-262-19398-1
Tsuji, T.; Terauchi, M & Tanaka, Y (2004) Online learning of virtual impedance parameters
in non-contact impedance control using neural networks, IEEE Transactions on Systems, Man, and Cybernetics, - Part B: Cybernetics, Vol 34, Issue 5, pp 2112-2118,
ISSN 1083-4419
Uno, Y.; Suzuki, R & Kawato, M (1989) Formation and control of optimal trajectory in
human multi-joint arm movement: minimum torque change model, Biological Cybernetics, Vol 61, No 2, pp 89-101, ISSN 1432-0770
Williams, R J (1992) Simple statistical gradient-following algorithms for connectionist
reinforcement learning, Machine Learning, Vol 8, No 3-4, pp 229-256 ISSN
1573-0565
Won, J (1993) The Control of Constrained and Partially Constrained Arm Movement, S M
Thesis, Department of Mechanical Engineering, MIT, Cambridge, MA, 1993
Xu, X.; He, H & Hu, D (2002) Efficient reinforcement learning using recursive least-squares
methods, Journal of Artificial Intelligent Research, Vol 16, pp 259-292
Trang 17of the industrial robot manipulator Having optimum mass and minimum joint torques are the ways of improving the energy efficiency in robot manipulators The inverse of inertia matrix can be used locally minimizing the joint torques (Nedungadi & Kazerouinian, 1989) This approach is similar to the global kinetic energy minimization
Several optimization techniques such as genetic algorithms (Painton & Campbell, 1995; Chen & Zalzasa, 1997; Choi et al., 1999; Pires & Machado, 1999; Garg & Kumar, 2002; Kucuk
& Bingul, 2006; Qudeiri et al., 2007), neural network (Sexton & Gupta, 2000; Tang & Wang, 2002) and minimax algorithms (Pin & Culioli, 1992; Stocco et al., 1998) have been studied in robotics literature Genetic algorithms (GAs) are superior to other optimization techniques such that genetic algorithms search over the entire population instead of a single point, use objective function instead of derivatives, deals with parameter coding instead of parameters themselves GA has recently found increasing use in several engineering applications such
as machine learning, pattern recognition and robot motion planning It is an adaptive heuristic search algorithm based on the evolutionary ideas of natural selection and genetic This provides a robust search procedure for solving difficult problems In this work, GA is applied to optimize the link masses of a three link robot manipulator to obtain minimum energy Rest of the Chapter is composed of the following sections In Section II, genetic algorithms are explained in a detailed manner Dynamic equations and the trajectory generation of robot manipulators are presented in Section III and Section IV, respectively Problem definition and formulation is described in Section V In the following Section, the rigid body dynamics of a cylindrical robot manipulator is given as example Finally, the contribution of this study is presented in Section VII
Trang 182 Genetic algorithms
GAs were introduced by John Holland at University of Michigan in the 1970s The improvements in computational technology have made them attractive for optimization problems A genetic algorithm is a non-traditional search method used in computing to find exact or approximate solutions to optimization and search problems based on the evolutionary ideas of natural selection and genetic The basic concept of GA is designed to simulate processes in natural system necessary for evolution, specifically those that follow the principles first laid down by Charles Darwin of survival of the fittest As such they represent an intelligent exploitation of a random search within a defined search space to solve a problem The obtained optima are an end product containing the best elements of previous generations where the attributes of a stronger individual tend to be carried forward into following generation The rule is survival of the fittest will The three basic features of GAs are as follows:
i Description of the objective function
ii Description and implementation of the genetic representation
iii Description and implementation of the genetic operators such as reproduction, crossover and mutation
If these basic features are chosen properly for optimization applications; the genetic algorithm will work quite well GA optimization possesses some unique features that separate from the other optimization techniques given as follows:
i It requires no knowledge or gradient information about search space
ii It is capable of scanning a vast solution set in a short time
iii It searches over the entire population instead of a single point
iv It allows a number of solutions to be examined in a single design cycle
v It deals with parameter coding instead of parameters themselves
vi Discontinuities in search space have little effect on overall optimization performance
vii It is resistant to becoming trapped in local optima
These features provide GA to be a robust and useful optimization technique over the other search techniques (Garg & Kumar, 2002) However, there are some disadvantages to use genetic algorithms
i Finding the exact global optimum in search space is not certain
ii Large numbers of fitness function evaluations are required
iii Configuration is not straightforward and problem dependent
The representation or coding of the variables being optimized has a large impact on search performance, as the optimization is performed on this representation of the variables The two most common representations, binary and real number codings, differ mainly in how the recombination and mutation operators are performed The most suitable choice of representation is based upon the type of application In GAs, a set of solutions represented
by chromosomes is created randomly Chromosomes used here are in binary codings Each zero and one in chromosome corresponds to a gene A typical chromosome can be given as
10000010
An initial population of random chromosomes is generated at the beginning The size of the initial population may vary according to the problem difficulties under consideration A
Trang 19Link Mass Optimization Using Genetic Algorithms for Industrial Robot Manipulators 277
different solution to the problem is obtained by decoding each chromosome A small initial with composed of eight chromosomes can be denoted as the form
At the reproduction process, chromosomes are chosen based on natural selections The chromosomes in the new population are selected according to their fitness values defined with respect to some specific criteria such as roulette wheel selection, rank selection or steady state selection The fittest chromosomes have a higher probability of reproducing one
or more offspring in the next generation in proportion to the value of their fitness
At the crossover stage, two members of population exchange their genes Crossover can be implemented in many ways such as having a single crossover point or many crossover points which are chosen randomly A simple crossover can be implemented as follows In the first step, the new reproduced members in the mating pool are mated randomly In the second step, two new members are generated by swapping all characteristics from a randomly selected crossover point A good value for crossover can be taken as 0.7 A simple crossover structure is shown below Two chromosomes are selected according to their fitness values The crossover point in chromosomes is selected randomly Two chromosomes are given below as an example
10110011
11100110 After crossover process is applied, all of the bits after the crossover point are swapped Hence, the new chromosomes take the form
101*00110 111*10011 The symbol ‘‘*’’ corresponds the crossover point At the mutation process, value of a particular gene in a selected chromosome is changed from 1 to 0 or vice versa., The probability of mutation is generally kept very small so these changes are done rarely In general, the scheme of a genetic algorithm can be summarized as follows
Trang 20i Create an initial population
ii Check each chromosome to observe how well is at solving the problem and
evaluate the fitness of the each chromosome based on the objective function
iii Choose two chromosomes from the current population using a selection method
like roulette wheel selection The chance of being selected is in proportion to the
value of their fitness
iv If a probability of crossover is attained, crossover the bits from each chosen
chromosome at a randomly chosen point according to the crossover rate
v If a probability of mutation is attained, implement a mutation operation according
to the mutation rate
vi Continue until a maximum number of generations have been produced
3 Dynamic Equations
A variety of approaches have been developed to derive the manipulator dynamics equations
(Hollerbach, 1980; Luh et al., 1980; Paul, 1981; Kane and Levinson, 1983; Lee et al., 1983) The
most popular among them are Lagrange-Euler (Paul, 1981) and Newton-Euler methods
(Luh et al., 1980) Energy based method (LE) is used to derive the manipulator dynamics in
this chapter To obtain the dynamic equations by using the Lagrange-Euler method, one
should define the homogeneous transformation matrix for each joint Using D-H (Denavit &
Hartenberg, 1955) parameters, the homogeneous transformation matrix for a single joint is
00
αααθαθ
αα
αθαθ
0θ
θ
1 1 1 1
1 1
1 1
1 1
i i i i i i i
i i i
i i i i
i i
i
i
dsscccs
as
c
where ai-1, αi-1, di, θi, ci and si are the link length, link twist, link offset, joint angle, cosθi and
sinθi, respectively In this way, the successive transformations from base toward the
end-effector are obtained by multiplying all of the matrices defined for each joint
The difference between kinetic and potential energy produces Lagrangian function given by
)(),(),
where q and q represent joint position and velocities, respectively Note that, qi is the joint
angle θi for revolute joints, or the distance di for prismatic joints While the potential
energy (P) is dependent on position only, the kinetic energy (K) is based on both position
and velocity of the manipulator The total kinetic energy of the manipulator is defined as
i i T i i n
i
i T
vq
q
2
1),(
where mi is the mass of link i and Ii denotes the 3x3 inertia tensor for the center of the link i
with respect to the base coordinate frame Ii can be expressed as
Trang 21Link Mass Optimization Using Genetic Algorithms for Industrial Robot Manipulators 279
T i m i
where R0i represents the rotation matrix and Im stands for the inertia tensor of a rigid object
about its center of mass The terms vi and ωi refer to the linear and angular velocities of the
center of mass of link i with respect to base coordinate frame, respectively
qqA
whereAi(q)and Bi(q) are obtained from the Jacobian matrix, Ji(q) If vi and ωi in
equations 5 are substituted in equation 3, the total kinetic energy is obtained as follows
qqq
i i i T i n
qqMqqq
2
1),
where M(q) denotes mass matrix given by
)(
1
qBIqBqAmqAq
i i i T i n
)()
(
1
qhgmq
P
n
i
i T i
∑
=
−
where g and hi(q) denotes gravitational acceleration vector and the center of mass of link i
relative to the base coordinate frame, respectively As a result, the Lagrangian function can
be obtained by combining the equations 7 and 9 as follows
)()
(2
1),(
1
qhgmqqMqqqL
n
i
i T i
The equations of motion can be derived by substituting the Lagrangian function in equation
10 into the Euler-Langrange equations
τ),(),
Trang 22i n
k n
j
i j k i kj n
where, τ is the nx1 generalized torque vector applied at joints, and q, q and q are the nx1
joint position, velocity and acceleration vectors, respectively M(q) is the nxn mass matrix,
1)()
qqMqq
i ij
k
i kj
)()
(k n
j
j ki j k
The friction term is omitted in equation 12 The detailed information about Lagrangian
dynamic formulation can be found in text (Schilling, 1990)
4 Trajectory Generation
In general, smooth motion between initial and final positions is desired for the end-effector
of a robot manipulator since jerky motions can cause vibration in the manipulator Joint and
Cartesian trajectories in robot manipulators are two common ways to generate smooth
motion In joint trajectory, initial and final positions of the end-effector are converted into
joint angles by using inverse kinematics equations A time (t) dependent smooth function is
computed for each joint All of the robot joints pass through initial and final points at the
same time Several smooth functions can be obtained from interpolating the joint values A
5th order polynomial is defined here under boundary conditions of joints (position, velocity,
and acceleration) as follows
i
x(0)= θi
x(0)= θi
x(0)= θ
f f
t
x( )=θf ft
x( )=θf ft
x( )=θThese boundary conditions uniquely specify a particular 5th order polynomial as follows
5 5 4 4 3 3 2 2 1 0)
The desired velocity and acceleration calculated, respectively
4 5 3 4 2 3 2
2)
Trang 23Link Mass Optimization Using Genetic Algorithms for Industrial Robot Manipulators 281
where s0,s1, ,s5are the coefficients of the polynomial and given by
2
)θθ()θ12θ()θθ(20
f
f i f f i f i f
t
tt
4
2 4
2
)θθ()θ16θ14()θθ(30
f
f f i f i f f i
t
tt
f
f f i f i f i f
t
tt
where θi, θf, θ , i θ , f θ , i θ denote initial and final position, velocity and acceleration, f
respectively tf is the time at final position Fig 1 illustrates the position velocity and
acceleration profiles for a single link robot manipulator Note that the manipulator is
assumed to be motionless at initial and final positions
Trang 245 Problem Formulation
If the manipulator moves freely in the workspace, the dynamic equation of motion for n
DOF robot manipulator is given by the equation 12 This equation can be written as a simple
matrix form as
τ)(),()(qq+Cqq +Gq =
Note that, the inertia matrix M(q) is a symmetric positive definite matrix The local
optimization of the joint torgue weighted by inertia results in resolutions with global
characteristics That is the solution to the local minimization problem (Lui & Wang, 2004)
The formulation of the problem is
Minimize τTM− 1τ (m1,m2,m3)
Subjected to
0)()(qq+Jqq−r=
wheref(q) is the position vector of the end-effector J(q) is the Jacobian matrix and defined as
qqfqJ
Theobjectivefunctioninequation 25 also minimizes this kinetic energy
6 Example
A three-link robot manipulator including two revolute joints and a prismatic joint is chosen
as an example to examine the optimum energy performance The transformation matrices
are obtained using well known D-H method (Denavit & Hartenberg, 1955) For
simplification, each link of the robot manipulator is modelled as a homogeneous cylindrical
or prismatic beam of mass, m which is located at the centre of each link The kinematics and
dynamic representations for the three-link robot manipulator are shown in Fig 2
Trang 25Link Mass Optimization Using Genetic Algorithms for Industrial Robot Manipulators 283
2
1
l
1 l
2
2
l
2 l
3 l
2
3
l
1 m
2 m
3 m 3
3 d
q = 2
2 = θ q 1
1 = θ q
1
h
1 , 0 x
1 , 0 y
1 , 0
2 y 2 z
3 x
3
y z3
Figure 2 Kinematics and dynamic representations for the three-link robot manipulator
The kinematics and dynamic link parameters for the three-link robot are given in Table 1
i
mI
21
where miandLCistand for link masses and the centers of mass of links, respectively The
matrix Im includes only the diagonal elements Ixx, Iyy and Izz They are called the principal
moment of inertia about x, y and z axes Since the mass distribution of the rigid object is
symmetric relative to the body attached coordinate frame, the cross products of inertia are
taken as zero The transformation matrices for the robot manipulator illustrated in Fig 2 are
obtained using the parameters in Table 1 as follows
0 1
0100
00θcosθsin
0θsinθcos
2 2
1 2 2
1
l
Trang 260010001
3
2 2
00
00mMMMM
where
3 2
4
1(4
2 2 1 2 3 2
2 2 2 2 2
14
1
2 3 2
2 1 2 2 2
Generalized torque vector applied at joints are
T]τττ[
where
1 2
1 2 2 1 2 2 3 2
1 2 2 2 2 2 2
1
4
1(4
1
τ =⎢⎣⎡ ml +Izz1+m l +l l +l +Izz2+m l + l +l +Izz3⎥⎦⎤
2 2
2 1 2 2 3 2
2 2 2
2
14
2
1(θsinθ
θ)2(θ
1 2
2 1 2 3 2
2 2 2
2
14
1(
τ =⎢⎣⎡m l + l l +Izz2+m l + l +Izz3⎥⎦⎤
Trang 27Link Mass Optimization Using Genetic Algorithms for Industrial Robot Manipulators 285
2 3
2 2
4
1
3 2
As a result, the objective function is obtained as follows
1 3
2 2 2 2 3 2 3
2 2 3 2 2 3
2 2 2 2 3 2
2 3 1
)θ
cos(
θcosτ
)θ
cos(ττ
τ
3 3 3
zz
zz T
Imll
mlm
Il
lmlmI
mll
mlm
IlmM
2 3
2 2 2 2 2 2 3 2 1
2 1 2 2 3 2 2 3 2 3
2 2 2 2 2 2 2 1 3
2 2 3 2 2 3
)θ
cos(
θcos2
τ)θ
cos(
θcosτ
3 3 3
−
−
−
++
+
zz zz
zz
zz
Imll
mlm
Ilml
lmlmI
mll
mlm
Il
lmlm
3 3 3ττ
6.1 Optimization with Genetic Algorithms
The displacement of the effector with respect to the time is illustrated in Fig 3 The
end-effector starts from the initial position pxi=44, pyi=0, pzi=18 at t=0 second, and reach the final
position pxf=13.2721, pyf=12.7279, pzf=2 at t=3 seconds in Cartesian space
When the end-effector perform a motion from the initial position pxi=44, pyi=0, pzi=18 to the
final position pxf=13.2721, pyf=12.7279, pzf=2 in 3 seconds in Cartesian space, each joint has
position velocity and acceleration profiles obtained from the 5th order polynomial as shown
in Fig 4
Ten samples obtained from Fig 4 at time intervals between 0 and 3 seconds for the
optimization problem are given in Table 2 Pos.-i, Vel.-i and Acc.-i represent position,
velocity and acceleration of ith joint of the robot manipulator, respectively (1≤ i ≤3) The
letter ‘‘S’’ represents sample
0 25
−
50 25
) 18 , 0 , 44 (
Figure 3 The displacement of the end-effector with respect to the time
Trang 28(c) The third joint
Figure 4 The position velocity and acceleration profiles for first, second and third joints
0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3 Seconds Seconds Seconds
0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3 Seconds Seconds Seconds
0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3 0 0.5 1 1.5 2 2.5 3 Seconds Seconds Seconds
Trang 29Link Mass Optimization Using Genetic Algorithms for Industrial Robot Manipulators 287
S Pos.-1 Vel.-1 Acc.-1 Pos.-2 Vel.-2 Acc.-2 Pos.-3 Vel.-2 Acc.-3
72.177 203.37 228.97 177.77 78.577 -39.822 -48.622 -19.022 -22.222 -29.422
0.047 2.594 11.70728.33350.82375.91299.684118.54130.20134.63
1.401 18.02 43.20 66.66 81.40 83.62 72.80 51.62 26.04 5.226
27.066 76.266 85.866 66.666 29.466 -14.933 -55.733 -82.133 -83.333 -48.533
0.005 0.307 1.387 3.358 6.023 8.997 11.81414.05015.43215.957
0.166 2.136 5.120 7.901 9.647 9.911 8.628 6.118 3.086 0.619
3.207 9.039 10.176 7.901 3.492 -1.769 -6.605 -9.734 -9.876 -5.752 Table 2 Position, velocity and acceleration samples of first, second and third joints
In robot design optimization problem, the link masses m1, m2 and m3 are limited to an upper
bound of 10 and to a lower bound of 0 The objective function does not only depend on the
design variables but also the joint variables (q1, q2, q3) which have lower and upper bounds
(0<q1<360, 0<q2<135 and 0<q3<16), on the joint velocities (q1,q2,q3) and joint accelerations
(q1,q2,q3) The initial and final velocities of each joint are defined as zero In order to
optimize link masses, the objective function should be as small as possible at all working
positions, velocities and accelerations The following relationship was adapted to specify the
corresponding fitness function
)10(ττ)9(ττ,,)2(ττ)1(τττ
In the GA solution approach, the influences of different population sizes and mutation rates
were examined to find the best GA parameters for the mass optimization problem where the
minimizing total kinetics energy The GA parameters, population sizes and mutation rates
were changed between 20-60 and 0.005-0.1 where the number of iterations was taken 50 and
100, respectively The GA parameters used in this study were summarized as follows
Mutation rate :0.005, 0.01, 0.1
Number of iteration :50, 100
All results related these GA parameters are given in Table 3 As can be seen from Table 3,
population sizes of 20, mutation rate of 0.01 and iteration number of 50 produce minimum
kinetic energy of 0 892 10-14 where the optimum link masses are m1=9.257, m2=1.642 and
m3=4.222
Trang 307 Conclusion
In this chapter, the link masses of the robot manipulators are optimized using GAs to obtain the best energy performance First of all, fundamental knowledge about genetic algorithms, dynamic equations of robot manipulators and trajectory generation were presented in detail Second of all, GA was applied to find the optimum link masses for the three-link serial robot manipulator Finally, the influences of different population sizes and mutation rates were searched to achieve the best GA parameters for the mass optimization The optimum link masses obtained at minimum torque requirements show that the GA optimization technique
is very consistent in robotic design applications Mathematically simple and easy coding features of GA also provide convenient solution approach in robotic optimization problems
8 References
Denavit, J.; Hartenberg, R S (1955) A kinematic notation for lower-pair mechanisms based
on matrices, Journal of Applied Mechanics, pp 215-221, vol 1, 1955
Hollerbach, J M (1980) A Recursive Lagrangian Formulation of Manipulator Dynamics and
A Comparative Study of Dynamic Formulation Complexity, IEEE Trans System.,
Trang 31Link Mass Optimization Using Genetic Algorithms for Industrial Robot Manipulators 289
Luh, J Y S.; Walker, M W & Paul, R P C (1980) On-Line Computational Scheme for
Mechanical Manipulators, Trans ASME, J Dynamic Systems, Measurement and
Paul, R P (1981) Robot Manipulators: Mathematics, Programming, and Control, MIT Press,
Cambridge Mass., 1981
Kane, T R.; Levinson, D A (1983) The Use of Kane’s Dynamic Equations in Robotics, Int J
of Robotics Research, vol 2, no 3, pp 3-20, 1983
Lee, C S G.; Lee, B H & Nigam, N (1983) Development of the Generalized D’Alembert
Equations of Motion for Robot Manipulators, Proc Of 22nd Conf on Decision and
Nedungadi, A.; Kazerouinian, K (1989) A local solution with global characteristics for joint
torque optimization of redundant manipulator, Journal of Robotic Systems, vol 6,
1989
Schilling, R (1990) Fundamentals of Robotics Analysis and Control, Prentice Hall, 1990,
New Jersey
Delingette, H.; Herbert, M & Ikeuchi, K (1992) Trajectory generation with curvature
constraint based on energy minimization, Proceedings of the IEEE/RSJ International
cooperating manipulation, Proceedings of the 23rd Annual Pittsburgh Modeling and
Pin, F.; Culioli, J (1992) Optimal positioning of combined mobile platform-manipulator
systems for material handling tasks,Journal of Intelligent Robotic System, Theory and
Painton, L.; Campbell, J (1995) Genetic algorithms in optimization of system reliability,
Hirakawa, A.;Kawamura, A (1996) Proposal of trajectory generation for redundant
manipulators using variational approach applied to minimization of consumed electrical energy,Proceedings of the Fourth International Workshop on Advanced Motion
Chen, M.; Zalzasa, A (1997) A genetic approach to motion planning of redundant mobile
manipulator systems considering safety and configuration, Journal of Robotic
Stocco, L.; Salcudean, S E & Sassani, F (1998) Fast constraint global minimax optimization
of robot parameters,Robotica,pp 595-605, vol 16, 1998
Choi, S.; Choi, Y & Kim, J (1999).Optimal working trajectory generation for a biped robot
using genetic algorithm, Proceedings of IEEE/RSJ International Conference on
Pires, E.; Machado, J (1999) Trajectory planner for manipulators using genetic algorithms,
1999
Sexton, R.; Gupta, J (2000) Comparative evaluation of genetic algorithm and back
propagation for training neural networks, Information Sciences, pp 45–59, 2000 Garg, D.; Kumar, M (2002) Optimization techniques applied to multiple manipulators for
path planning and torque minimization, Journal for Engineering Applications of Artificial Intelligence, pp 241-252, vol 15, 2002
Trang 32Tang, W.; Wang, J (2002) Two recurrent neural networks for local joint torque optimization
of kinematically redundant manipulators,IEEE Trans Syst Man Cyber, Vol 32(5),
2002
Lui, S.; Wang, J (2004) A dual neural network for bi-criteria torque optimization of
redundant robot manipulators, Lecture Notes in Computer Science, vol 3316, pp 1142-1147, 2004
Kucuk S.; Bingul Z (2006) Link mass optimization of serial robot manipulators using
genetic algorithms, Lecture Notes in Computer Science, Knowledge-Based Intelligent
Qudeiri, J A.; Yamamoto, H & Ramli, R (2007) Optimization of operation sequence in
CNC machine tools using genetic algorithm, Journal of Advance Mechanical Design,
Trang 3316
FPGA-Realization of a Motion Control IC for
Robot Manipulator
Ying-Shieh Kung and Chia-Sheng Chen
Southern Taiwan University
Taiwan
1 Introduction
Robotic control is currently an exciting and highly challenging research focus Several solutions for implementing the control architecture for robots have been proposed (Kabuka et al., 1988; Yasuda, 2000; Li et al., 2003; Oh et al., 2003) Kabuka et al (1998) apply two high-performance floating-point signal processors and a set of dedicated motion controllers to build
a control system for a six-joint robots arm Yasuda (2000) adopts a PC-based microcomputer and several PIC microcomputers to construct a distributed motion controller for mobile robots
Li et al (2003) utilize an FPGA (Field Programmable Gate Array) to implement autonomous fuzzy behavior control on mobile robot Oh et al (2003) present a DSP (Digital Signal Processor) and a FPGA to design the overall hardware system in controlling the motion of biped robots However, these methods can only adopt PC-based microcomputer or the DSP chip to realize the software part or adopt the FPGA chip to implement the hardware part of the robotic control system They do not provide an overall hardware/software solution by a single chip in implementing the motion control architecture of robot system
For the progress of VLSI technology, the Field programmable gate arrays (FPGAs) have been widely investigated due to their programmable hard-wired feature, fast time-to-market, shorter design cycle, embedding processor, low power consumption and higher density for implementing digital system FPGA provides a compromise between the special-purpose ASIC (application specified integrated circuit) hardware and general-purpose processors (Wei
et al., 2005) Hence, many practical applications in motor control (Zhou et al., 2004; Yang et al., 2006; Monmasson & Cirstea, 2007) and multi-axis motion control (Shao & Sun, 2005) have been studied, using the FPGA to realize the hardware component of the overall system
The novel FPGA (Field Programmable Gate Array) technology is able to combine an embedded processor IP (Intellectual Property) and an application IP to be an SoPC (System-on-a-Programmable-Chip) developing environment (Xu et al., 2003; Altera, 2004; Hall & Hamblen, 2004), allowing a user to design an SoPC module by mixing hardware and software The circuits required with fast processing but simple computation are suitable to be implemented by hardware in FPGA, and the highly complicated control algorithm with heavy computation can be realized by software in FPGA Results that the software/hardware co-design function increase the programmable, flexibility of the designed digital system and reduce the development time Additionally, software/hardware parallel processing enhances the controller performance Our previous works (Kung & Shu, 2005; Kung et al., 2006; Kung &
Trang 34Tsai, 2007) have successfully applied the novel FPGA technology to the servo system of PMSM drive, robot manipulator and X-Y table
To exploit the advantages, this study presents a fully digital motion control IC for a five-axis robot manipulator based on the novel FPGA technology, as in Fig 1(a) Firstly, the mathematical model and the servo controller of the robot manipulator are described Secondly, the inverse kinematics and motion trajectory is formulated Thirdly, the circuit being designed to implement the function of motion control IC is introduced The proposed motion control IC has two IPs One
IP performs the functions of the motion trajectory for robot manipulator The other IP performs the functions of inverse kinematics and five axes position controllers for robot manipulator The former is implemented by software using Nios II embedded processor due to the complicated processing but low sampling frequency control (motion trajectory: 100Hz) The latter is realized
by hardware in FPGA owing to the requirements of high sampling frequency control (position loop:762Hz, speed loop:1525Hz, PWM circuit: 4~8MHz) but simple processing To reduce the usage of the logic gates in FPGA, an FSM (Finite State Machine) method is presented to model the overall computation of the inverse kinematics, five axes position controllers and five axes speed controllers Then the VHDL (VHSIC Hardware Description Language) is adopted to describe the circuit of FSM Hence, all functionality required to build a fully digital motion controller for a five-axis robot manipulator can be integrated in one FPGA chip As the result, hardware/software co-design technology can make the motion controller of robot manipulator more compact, flexible, better performance and less cost The FPGA chip employed herein is an Altera Stratix II EP2S60F672C5ES (Altera, 2008), which has 48,352 ALUTs (Adaptive Look-Up Tables), maximum 718 user I/O pins, total 2,544,192 RAM bits And a Nios II embedded processor which has a 32-bit configurable CPU core, 16 M byte Flash memory, 1 M byte SRAM and 16 M byte SDRAM is used to construct the SoPC development environment Finally, an experimental system included by an FPGA experimental board, five sets of inverter and a Movemaster RV-M1 micro robot produced by Mitsubishi is set up to verify the correctness and effectiveness of the proposed FPGA-based motion control IC
2 Important
The typical architecture of the conventional motion control system for robot manipulator is shown in Fig 1(b), which consists of a central controller, five sets of servo drivers and one robot manipulator The central controller, which usually adopts a float-pointed processor, performs the function of motion trajectory, inverse kinematics, data communication with servo drivers and the external device Each servo driver uses a fixed-pointed processor, some specific ICs and an inverter to perform the functions of position control at each axis of robot manipulator and the data communication with the central controller Data communication media between them may be an analog signal, a bus signal or a serial asynchronous signal Therefore, once the central controller receives a motion command from external device It will execute the computation of motion trajectory and inverse kinematics, generate five position commands and send those signals to five servo drivers Each servo driver received the position command from central controller will execute the position controller to control the servo motor
of manipulator As the result, the motion trajectory of robot manipulator will follow the prescribed motion command However, the motion control system in Fig 1 (b) has some drawbacks, such as large volume, easy effected by the noise, expensive cost, inflexible, etc In addition, data communication and handshake protocol between the central controller and servo drivers slow down the system executing speed To improve aforementioned drawbacks,
Trang 35FPGA-Realization of a Motion Control IC for Robot Manipulator 293
based on the novel FPGA technology, the central controller and the controller part of five servo drivers in Fig 1 (b) are integrated into a motion control IC in this study, which is shown in Fig 1(a) The proposed motion control IC has two IPs One IP performs the functions of the motion trajectory by software The other IP performs the functions of inverse kinematics and five axes position controllers by hardware Hence, hardware/software co-design technology can make the motion controller of robot manipulator more compact, flexible, better performance and less cost The detailed design methodology is described in the following sections, and the contributions of this study will be illustrated in the conclusion section
encoder
L C Rectifier Inverter
Three-phase
ac source Isolated and driving circuits
+
A T
−
A TA T
Three-phase
ac source Isolated and driving circuits
+
A T
−
A T A T
Three-phase
ac source Isolated and driving circuits
+
A T
−
A TA T
Inverter
Three-phase
ac source Isolated and driving circuits
+
A T A T
−
A T A T
Inverter
Three-phase
ac source Isolated and driving circuits
+
A TA T
−
A TA T
J1-axis
J2-axis J3-axis
J4-axis
J5-axis
J3-axis servo motor
J5-axis servo motor
J2-axis servo motor
J1-axis servo motor
Digital circuits for inverse kinematics and five-axis position controller
QEP
Controller-3 Inverter-3 Servo driver-3
QEP
Controller-4 Inverter-4 Servo driver-4
QEP
Controller-5 Inverter-5 Servo driver-5
QEP
Inverter-1
1 st axis
DC motor Servo driver-1
QEP
Central controller
Controller-2 Controller-1
1 st axis position command
2 nd axis position command
3 rd axis position command
4 th axis position command
5 th axis position command
The five-axis articulated robot manipulator
Trang 363 System description and design method of robot manipulator
3.1 Mathematical model of a robot manipulator with actuator
The dynamic equation of the n-link manipulator is given by (Lewis et al., 1993):
τθθθθθθ
G denotes the gravity vector; and F(θ.) denotes the friction vector The terms θ, θ. , θ
and τ represent the n-vector of the position, the velocity, the acceleration and the
generalized forces, respectively; and are set to R n The dynamic model of the ith axis DC
motor drive system for the robot manipulator is expressed as
i i i
i i a i i b M
a R i v K dt
di
i M
M i =K i i
i i i i i
K ,
i M
K denote inductance, resistance, current, voltage, voltage constant and current constant, respectively Terms
i M
τ denote the inertial,
viscous and generated torque of the motor, and the load torque to the ith axis of the robot
manipulator, respectively Consider a friction
assume that the inductance term in (2) is small and can be ignored Then, (2) to (4) can be
arranged and simplified by the following equation
i a
M i i M M a
b M M M
R
K r F R
K K B J
i
i i
i i i i i i
Therefore, the dynamics of the dc motors that drive the arm links are given by the n
decoupled equations
v K R F B
with
)(),
(),
(
}/{},
{
}{}/{
}{},
{
i
i i
i i i i
i i
M M
i i
a M M
i
i a
M b M
M M
M M
F vec F vec v vce v
R K diag K r diag R
B diag R
K K B diag B
J diag J vec
Δ
θθ
(7)
The dynamic equations including the robot manipulator and DC motor can be obtained by
combining (1) and (6) The gear ratio of the coupling from the motor i to arm link i is given
by ri, which is defined as
i
M i
Trang 37FPGA-Realization of a Motion Control IC for Robot Manipulator 295
Hence, substituting (8) into (6), then into (1), it can be obtained by
v RK G R F R RF V
R B M
R
The gear ratior i of a commercial robot manipulator is usually small (< 1/100) to increase
the torque value Therefore, the formula can be simplified From (9), the dynamic equation
combining the motor and arm link is expressed as
i i i a
M i M i i i i ii i
R
K r F r B m r J
i
i i i
=
i i k j jki j
1 1 1
a
L + K M1
1 1 1
M
1 1
M
θ
+ -
+
A five-axis motion and servo controller DC motor and driver of arm1 th
+ -
1 1
1 −
−
− +K z z
K i p
*
5 5 1
a
L + K M5
5 5 1
M
1 5
M
θ
+ -
+
+ -
1 1
1 −
−
−z z K
K i p
i a
M i M i i i i
R
K r F r B J
i
i i
Substituting (7) and (8) into (12), yields
i a
M M M a
b M M M
R
K F R
K K B J
i
i i i i
i i i i
Trang 38If the gear ratio is small, then the control block diagram combining arm link, motor actuator,
position loop P controller, speed loop PI controller, inverse kinematics and trajectory
planning for the servo and motion controller in a five-axis robot manipulator can be
represented in Fig 2
In Fig.2, the digital PI controllers in the speed loop are formulated as follows
)()(n k e n
) n ( e k ) n ( v ) n (
) n ( v ) n ( v ) n (
Where thek p, k i are P-controller gain and I-controller gain, the e (n) is the error between
command and sensor value and the v (n) is the PI controller output
3.2 Computational of the inverse kinematics for robot manipulator
Figure 3 shows the link coordinate system of the five-axis articulated robot manipulator
using the Denavit-Hartenberg convention Table 1 illustrates the values of the kinematics
parameters The inverse kinematics of the articulated robot manipulator will transform the
coordinates of robot manipulator from Cartesian space R3 (x,y,z) to the joint space R5
(θ1, θ2, θ3, θ4, θ5) and it is described in detail by author of (Schilling, 1998) The
computational procedure is as follows
2 2 2 5 1 2 1
a a z d d b
where a 1 , a 5 , d 1 , d 5 are Denavit-Hartenberg parameters and atan2 function refers to Table 2
Trang 39FPGA-Realization of a Motion Control IC for Robot Manipulator 297
3
5θ
Figure 3 Link coordinates of a five-axis robot manipulator
5 4 3 2 1
5 4 3 2 1
α
0
)( 2
1 π
α −
)( 2
d 1
) (
5 72 mm d
a 3
)( 250 mm
Table 1 Denavit-Hartenberg parameters of the robot manipulator
Case Quadrants atan2 (y, x)
x >0 1, 4 arctan (y / x)
x =0 1, 4 [sgn (y)] π/2
x <0 2, 3 arctan (y / x) + [sgn (y)] π
Table 2 Four-quadrant arctan function (Schilling, 1998)
3.3 Planning of the motion trajectory
3.3.1The computation of the point-to-point motion trajectory
To consider the smooth motion when manipulating the robot, the point-to-point motion scheme with constant acceleration/deceleration trapezoid velocity profile is applied in Fig.2
In this scheme, the designed parameters at each axis of robot are the overall displacement
*
i
θ
Δ (degree), the maximum velocity W i (degree/s), the acceleration or deceleration period
T acc and the position loop sampling interval t d Therefore, the instantaneous position
command q at each sampling interval, based on the trapezoid velocity profile, can be
computed as follows:
Step 1: Compute the overall running time First, compute the maximum running time
without considering the acceleration and deceleration:
Trang 405 4
* 4 3
* 3 2
* 2 1
* 1
This T 1 must exceed acceleration time T acc The acceleration and deceleration design is then
considered, and the overall running time is
Step 2: Adjust the overall running time to meet the condition of a multiple of the sampling
interval
Where N represents the interpolation number and [.] refers to the Gauss function with
integer output value Hence,
* 3
* 2
*
1 Δ θ Δ θ Δ θ Δ θ θ
Δ
)/(
],,,,[ W1W2 W3 W4 W5 T L T T acc
Step 4 : Calculate the acceleration/deceleration value
acc T W
Step 5 : Calculate the position command, q=[q1,q2,q3,q4,q5], at the mid-position
a Acceleration region:
2 0
2
1
t
* A
* q
where t =n* t d , 0<n≤ N ′ , and q denotes the initial position r0
b Constant speed region:
t
* W q
* A
* t
* W ( q
where t =n* t d , 0<n≤N′ and q r2 denotes the final position in the constant speed region